[
https://issues.apache.org/jira/browse/SPARK-47150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867395#comment-17867395
]
Anthony Sgro commented on SPARK-47150:
--------------------------------------
This is a big issue for viewing the Spark History Server as well. I cannot
access the history server application ui anymore because of this change, and it
needs to be fixed. I would second supporting some config that could apply the
DEFAULT_MAX_STRING_LEN for jackson-core globally.
> String length (...) exceeds the maximum length (20000000)
> ---------------------------------------------------------
>
> Key: SPARK-47150
> URL: https://issues.apache.org/jira/browse/SPARK-47150
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 3.5.0
> Reporter: Sergii Mikhtoniuk
> Priority: Minor
>
> Upgrading to Spark 3.5.0 introduced a regression for us where our query
> gateway (Livy) fails with an error:
> {code:java}
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
> (20054016) exceeds the maximum length (20000000)
> (sorry, unable to provide full stack trace){code}
> The root of this problem is the breaking change in {{jackson}} that (in the
> name of "safety") introduced some JSON size limits, see:
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
> but there seems to be no way to set it globally or pass it down to
> [{{DataFrame::toJSON()}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.toJSON.html]
> which our Apache Livy server is using when transmitting data.
> Livy is an old project and transferring dataframes via JSON is super
> inefficient, and we really should move to something like Spark Connect, but I
> believe this issue can happen to many people working with basic GeoJSON data.
> Spark can handle very large strings, and this arbitrary limit just gets in a
> way of output serialization for no good reason.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]