[
https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun reassigned SPARK-44940:
-------------------------------------
Assignee: Ivan Sadikov
> Improve performance of JSON parsing when
> "spark.sql.json.enablePartialResults" is enabled
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-44940
> URL: https://issues.apache.org/jira/browse/SPARK-44940
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.0, 3.5.0, 4.0.0
> Reporter: Ivan Sadikov
> Assignee: Ivan Sadikov
> Priority: Major
> Labels: correctness
>
> Follow-up on https://issues.apache.org/jira/browse/SPARK-40646.
> I found that JSON parsing is significantly slower due to exception creation
> in control flow. Also, some fields are not parsed correctly and the exception
> is thrown in certain cases:
> {code:java}
> Caused by: java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to
> org.apache.spark.sql.catalyst.InternalRow
> at
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51)
> at
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51)
> at
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590)
> ... 39 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]