rahil-c commented on PR #8682: URL: https://github.com/apache/hudi/pull/8682#issuecomment-1558191054
Hi @danny0405 @xiarixiaoyao, we are trying to upgrade spark to 3.4.0 in hudi. However we are facing issues with several functional test failures due to another casting exception. For example when running the test `TestAvroSchemaResolutionSupport#testArrayOfMapsChangeValueType ` we hit the following issue `java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to org.apache.spark.sql.vectorized.ColumnarBatch` ``` java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to org.apache.spark.sql.vectorized.ColumnarBatch 2023-05-16T01:46:35.0110639Z at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.next(DataSourceScanExec.scala:600) 2023-05-16T01:46:35.0110882Z at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.next(DataSourceScanExec.scala:589) 2023-05-16T01:46:35.0111237Z at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source) 2023-05-16T01:46:35.0111621Z at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) 2023-05-16T01:46:35.0111933Z at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 2023-05-16T01:46:35.0112206Z at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) 2023-05-16T01:46:35.0112432Z at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) 2023-05-16T01:46:35.0112618Z at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) 2023-05-16T01:46:35.0112814Z at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) 2023-05-16T01:46:35.0113021Z at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 2023-05-16T01:46:35.0113220Z at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) 2023-05-16T01:46:35.0113374Z at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) 2023-05-16T01:46:35.0113580Z at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) ``` I think we can get past this by having vectorized reader disabled as well as code gen disabled but I do not think these are acceptable workarounds. Was wondering if we can get your thoughts, would be happy to sync offline at some point to provide findings as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
