JensonChoi commented on code in PR #958: URL: https://github.com/apache/datafusion-comet/pull/958#discussion_r1837756417
########## spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala: ########## @@ -1707,6 +1707,29 @@ class CometExecSuite extends CometTestBase { } } + test("SparkToColumnar override node name for row input") { + withSQLConf( + SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", + CometConf.COMET_SHUFFLE_MODE.key -> "jvm") { + val df = spark + .range(1000) + .selectExpr("id as key", "id % 8 as value") + .toDF("key", "value") + .groupBy("key") + .count() + df.collect() + + val planAfter = df.queryExecution.executedPlan + assert(planAfter.toString.startsWith("AdaptiveSparkPlan isFinalPlan=true")) + val adaptivePlan = planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan + val nodeNames = adaptivePlan.collect { case c: CometSparkToColumnarExec => + c.nodeName + } + assert(nodeNames.length == 1) + assert(nodeNames.head == "CometSparkRowToColumnar") Review Comment: Hey @andygrove, I'm a bit stuck on the unit test for `CometSparkColumnarToColumnar`. I pushed a commit that contains what I've been working on, so you can take a look. However, I'm getting this error when I run the unit test: ``` Cause: java.lang.ClassCastException: class org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to class org.apache.spark.sql.catalyst.InternalRow (org.apache.spark.sql.vectorized.ColumnarBatch and org.apache.spark.sql.catalyst.InternalRow are in unnamed module of loader 'app') at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:389) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) ``` Would appreciate any help. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org