dzcxzl created SPARK-23603:
------------------------------

             Summary: When the length of the json is in a range,get_json_object 
will result in missing tail data
                 Key: SPARK-23603
                 URL: https://issues.apache.org/jira/browse/SPARK-23603
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0, 2.2.0, 2.0.0
            Reporter: dzcxzl


Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of 
the value is in a range

[https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]

[https://github.com/FasterXML/jackson-core/issues/307]

 

spark-shell:

 
{code:java}
val value = "x" * 3000
val json = s"""{"big": "$value"}"""
spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect

res0: Array[org.apache.spark.sql.Row] = Array([2991])
{code}
correct result : 3000

 

 

There are two solutions
One is
bump jackson version to 2.7.7
The other one is
Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to