[
https://issues.apache.org/jira/browse/SPARK-47704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-47704:
-----------------------------------
Labels: pull-request-available (was: )
> JSON parsing fails with "java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to
> org.apache.spark.sql.catalyst.util.ArrayData" when
> spark.sql.json.enablePartialResults is enabled
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-47704
> URL: https://issues.apache.org/jira/browse/SPARK-47704
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0, 3.5.1
> Reporter: Ivan Sadikov
> Priority: Major
> Labels: pull-request-available
>
> When reading the following JSON \{"a":[{"key":{"b":0}}]}:
> {code:java}
> val df = spark.read.schema("a array<map<string, struct<b
> boolean>>>").json(path){code}
> Spark throws exception:
> {code:java}
> Cause: java.lang.ClassCastException: class
> org.apache.spark.sql.catalyst.util.ArrayBasedMapData cannot be cast to class
> org.apache.spark.sql.catalyst.util.ArrayData
> (org.apache.spark.sql.catalyst.util.ArrayBasedMapData and
> org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader
> 'app')
> at
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:53)
> at
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:53)
> at
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:172)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:605)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.$anonfun$prepareNextFile$1(FileScanRDD.scala:884)
> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) {code}
>
> The same happens for map: \{"a":{"key":[{"b":0}]}} when array and map types
> are swapped.
> {code:java}
> val df = spark.read.schema("a map<string, array<struct<b
> boolean>>>").json(path) {code}
>
> This is a corner case that https://issues.apache.org/jira/browse/SPARK-44940
> missed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]