[
https://issues.apache.org/jira/browse/SPARK-47150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925164#comment-17925164
]
Xiang Li commented on SPARK-47150:
----------------------------------
No sure if this issue could be closed as a dup of
https://issues.apache.org/jira/browse/SPARK-49872
[~sergiimk]
> String length (...) exceeds the maximum length (20000000)
> ---------------------------------------------------------
>
> Key: SPARK-47150
> URL: https://issues.apache.org/jira/browse/SPARK-47150
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 3.5.0
> Reporter: Sergii Mikhtoniuk
> Priority: Minor
>
> Upgrading to Spark 3.5.0 introduced a regression for us where our query
> gateway (Livy) fails with an error:
> {code:java}
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
> (20054016) exceeds the maximum length (20000000)
> (sorry, unable to provide full stack trace){code}
> The root of this problem is the breaking change in {{jackson}} that (in the
> name of "safety") introduced some JSON size limits, see:
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
> but there seems to be no way to set it globally or pass it down to
> [{{DataFrame::toJSON()}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.toJSON.html]
> which our Apache Livy server is using when transmitting data.
> Livy is an old project and transferring dataframes via JSON is super
> inefficient, and we really should move to something like Spark Connect, but I
> believe this issue can happen to many people working with basic GeoJSON data.
> Spark can handle very large strings, and this arbitrary limit just gets in a
> way of output serialization for no good reason.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]