[jira] [Commented] (SPARK-47150) String length (...) exceeds the maximum length (20000000)

Anthony Sgro (Jira) Fri, 19 Jul 2024 11:05:04 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-47150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867395#comment-17867395
 ]


Anthony Sgro commented on SPARK-47150:
--------------------------------------

This is a big issue for viewing the Spark History Server as well. I cannot 
access the history server application ui anymore because of this change, and it 
needs to be fixed. I would second supporting some config that could apply the 
DEFAULT_MAX_STRING_LEN for jackson-core globally.

> String length (...) exceeds the maximum length (20000000)
> ---------------------------------------------------------
>
>                 Key: SPARK-47150
>                 URL: https://issues.apache.org/jira/browse/SPARK-47150
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 3.5.0
>            Reporter: Sergii Mikhtoniuk
>            Priority: Minor
>
> Upgrading to Spark 3.5.0 introduced a regression for us where our query 
> gateway (Livy) fails with an error:
> {code:java}
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length 
> (20054016) exceeds the maximum length (20000000)
> (sorry, unable to provide full stack trace){code}
> The root of this problem is the breaking change in {{jackson}} that (in the 
> name of "safety") introduced some JSON size limits, see: 
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this 
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
>  but there seems to be no way to set it globally or pass it down to 
> [{{DataFrame::toJSON()}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.toJSON.html]
>  which our Apache Livy server is using when transmitting data.
> Livy is an old project and transferring dataframes via JSON is super 
> inefficient, and we really should move to something like Spark Connect, but I 
> believe this issue can happen to many people working with basic GeoJSON data.
> Spark can handle very large strings, and this arbitrary limit just gets in a 
> way of output serialization for no good reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-47150) String length (...) exceeds the maximum length (20000000)

Reply via email to