[ https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dzcxzl updated SPARK-23603: --------------------------- Description: Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of the value is in a range [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7] [https://github.com/FasterXML/jackson-core/issues/307] spark-shell: {code:java} val value = "x" * 3000 val json = s"""{"big": "$value"}""" spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect res0: Array[org.apache.spark.sql.Row] = Array([2991]) {code} expect result : 3000 actual result : 2991 There are two solutions One is *Bump jackson from 2.6.7&2.6.7.1 to 2.7.7* The other one is *Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)* was: Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of the value is in a range [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7] [https://github.com/FasterXML/jackson-core/issues/307] spark-shell: {code:java} val value = "x" * 3000 val json = s"""{"big": "$value"}""" spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect res0: Array[org.apache.spark.sql.Row] = Array([2991]) {code} expect result : 3000 actual result : 2991 There are two solutions One is bump jackson version to 2.7.7 The other one is Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text) > When the length of the json is in a range,get_json_object will result in > missing tail data > ------------------------------------------------------------------------------------------ > > Key: SPARK-23603 > URL: https://issues.apache.org/jira/browse/SPARK-23603 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.2.0, 2.3.0 > Reporter: dzcxzl > Priority: Major > > Jackson(>=2.7.7) fixes the possibility of missing tail data when the length > of the value is in a range > [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7] > [https://github.com/FasterXML/jackson-core/issues/307] > spark-shell: > {code:java} > val value = "x" * 3000 > val json = s"""{"big": "$value"}""" > spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect > res0: Array[org.apache.spark.sql.Row] = Array([2991]) > {code} > expect result : 3000 > actual result : 2991 > There are two solutions > One is > *Bump jackson from 2.6.7&2.6.7.1 to 2.7.7* > The other one is > *Replace writeRaw(char[] text, int offset, int len) with writeRaw(String > text)* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org