Re: [I] [VL] Results are mismatch with vanilla Spark on release-1.1 when use get_json_object operator [incubator-gluten]

via GitHub Mon, 26 Aug 2024 00:56:07 -0700


jiangjiangtian commented on issue #5253:
URL: 
https://github.com/apache/incubator-gluten/issues/5253#issuecomment-2309581597


   @PHILO-HE @NEUpanning @kecookier 
   I find a sql that have a result mismatch:
   ```
   select 
get_json_object('{"businessCode":"xxx","msgId":"12","msgTime":23,"assistantId":34,"assistantAccount":"xxx","friend":{"wwUserId":"xxx","nickname":"\\๑","unionId":"xxxx","avatar":"xxx","userType":1,"wwCorpId":"","wwCorpName":"","wwAccount":""},"message":{"contentType":1,"content":"xxx"}}',
 '$.friend.unionId');
   ```
   Gluten returns `xxxx`, but spark returns `NULL`.
   The reason is that when the json parser in Spark meets `\`, it will further 
check the character after the `\` to see whether it is a valid escape 
character. In this case, `\๑` is not a valid escape character. So spark will 
return `NULL`.
   > 
https://github.com/FasterXML/jackson-core/blob/8744bd42770c9e277d995ef00fb518940efef3ef/src/main/java/com/fasterxml/jackson/core/json/ReaderBasedJsonParser.java#L2648-#L2699
   
   But in `simdjson`, there exists no extra check.
   So to fix this question, I think we need to add code in `simdjson` to do the 
check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [VL] Results are mismatch with vanilla Spark on release-1.1 when use get_json_object operator [incubator-gluten]

Reply via email to