jiangjiangtian commented on issue #5253:
URL:
https://github.com/apache/incubator-gluten/issues/5253#issuecomment-2309581597
@PHILO-HE @NEUpanning @kecookier
I find a sql that have a result mismatch:
```
select
get_json_object('{"businessCode":"xxx","msgId":"12","msgTime":23,"assistantId":34,"assistantAccount":"xxx","friend":{"wwUserId":"xxx","nickname":"\\๑","unionId":"xxxx","avatar":"xxx","userType":1,"wwCorpId":"","wwCorpName":"","wwAccount":""},"message":{"contentType":1,"content":"xxx"}}',
'$.friend.unionId');
```
Gluten returns `xxxx`, but spark returns `NULL`.
The reason is that when the json parser in Spark meets `\`, it will further
check the character after the `\` to see whether it is a valid escape
character. In this case, `\๑` is not a valid escape character. So spark will
return `NULL`.
>
https://github.com/FasterXML/jackson-core/blob/8744bd42770c9e277d995ef00fb518940efef3ef/src/main/java/com/fasterxml/jackson/core/json/ReaderBasedJsonParser.java#L2648-#L2699
But in `simdjson`, there exists no extra check.
So to fix this question, I think we need to add code in `simdjson` to do the
check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]