[
https://issues.apache.org/jira/browse/SPARK-51892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yang Jie resolved SPARK-51892.
------------------------------
Resolution: Duplicate
The issue can no longer be reproduced after Spark 3.5.2. It seems that
SPARK-48863 has resolved this issue, so I'll go ahead and close it for now.
[~praneetsharma] If it's confirmed that the issue still persists, feel free to
ping me, and I'll reopen it.
> Reading JSON file with schema array[array[struct]] fails with
> ClassCastException in Spark 3.5.1
> -----------------------------------------------------------------------------------------------
>
> Key: SPARK-51892
> URL: https://issues.apache.org/jira/browse/SPARK-51892
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.5.1
> Environment: spark-shell of Spark 3.5.1
> Reporter: Praneet Sharma
> Priority: Critical
> Attachments: b.json
>
>
> Hi, we have a JSON with 1 column of type: array[array[struct]]. In Spark
> 3.5.1, when we read this JSON using spark.read.json and pass the schema, it
> fails with classcastexception. The same code used to work in Spark 3.3.1
> *Code to reproduce* (uses the attached b.json as input):
> {code:java}
> import org.apache.spark.sql.types._
> val nestedStructSchema = StructType(Seq(
> StructField("c_union", IntegerType, true),
> StructField("c_boolean", BooleanType, true),
> StructField("c_double", DoubleType, true),
> StructField("c_int", IntegerType, true),
> StructField("c_long", IntegerType, true),
> StructField("c_string", StringType, true)
> ))
> val innerArraySchema = ArrayType(nestedStructSchema, true)
> val outerArraySchema = ArrayType(innerArraySchema, true)
> val finalSchema = StructType(Seq(
> StructField("array_array_struct", outerArraySchema, true)
> ))
> val df3 = spark.read.schema(finalSchema).json("/home/devbld/Desktop/b.json")
> df3.show
> {code}
> {*}Error{*}:
> {code:java}
> Caused by: java.lang.ClassCastException: class
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast
> to class org.apache.spark.sql.catalyst.util.ArrayData
> (org.apache.spark.sql.catalyst.expressions.GenericInternalRow and
> org.apache.spark.sql.catalyst.util.ArrayData are in unnamed module of loader
> 'app')
> at
> org.apache.spark.sql.catalyst.util.GenericArrayData.getArray(GenericArrayData.scala:77)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> at
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.$anonfun$apply$1(FileFormat.scala:156)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:197)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]