[I] Collecting parquet without any transformations throws an exception [datafusion-comet]

via GitHub Fri, 04 Apr 2025 11:47:33 -0700


l0kr opened a new issue, #1588:
URL: https://github.com/apache/datafusion-comet/issues/1588


   ### Describe the bug
   
   While loading parquet with Spark scan and converting to native then 
collecting dataframe without any transformation throws an exception: 
`java.lang.ClassCastException: class 
org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to class 
org.apache.spark.sql.catalyst.InternalRow`
   
   More detailed stacktrace:
   ```
   Caused by: java.lang.ClassCastException: class 
org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to class 
org.apache.spark.sql.catalyst.InternalRow 
(org.apache.spark.sql.vectorized.ColumnarBatch and 
org.apache.spark.sql.catalyst.InternalRow are in unnamed module of loader 'app')
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:389)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
   
   ### Steps to reproduce
   
   ```scala
     test("Reproduce error with collect") {
       withSQLConf(
         CometConf.COMET_NATIVE_SCAN_ENABLED.key -> "false",
         CometConf.COMET_CONVERT_FROM_PARQUET_ENABLED.key -> "true"
       ) {
         withTempDir { dir =>
           var df = spark
             .range(10000)
             .selectExpr("id as key", "id % 8 as value")
             .toDF("key", "value")
   
           df.write.mode("overwrite").parquet(dir.toString)
           df = spark.read.parquet(dir.toString)
           df.collect()
         }
       }
     }
   ```
   
   ### Expected behavior
   
   No exception thrown
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Collecting parquet without any transformations throws an exception [datafusion-comet]

Reply via email to