Re: [PR] chore: Override node name for CometSparkToColumnar [datafusion-comet]

via GitHub Wed, 30 Oct 2024 09:16:17 -0700


andygrove commented on code in PR #958:
URL: https://github.com/apache/datafusion-comet/pull/958#discussion_r1822970181



##########
spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala:
##########
@@ -1707,6 +1707,29 @@ class CometExecSuite extends CometTestBase {
     }
   }
 
+  test("SparkToColumnar override node name for row input") {
+    withSQLConf(
+      SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+      CometConf.COMET_SHUFFLE_MODE.key -> "jvm") {
+      val df = spark
+        .range(1000)
+        .selectExpr("id as key", "id % 8 as value")
+        .toDF("key", "value")
+        .groupBy("key")
+        .count()
+      df.collect()
+
+      val planAfter = df.queryExecution.executedPlan
+      assert(planAfter.toString.startsWith("AdaptiveSparkPlan 
isFinalPlan=true"))
+      val adaptivePlan = 
planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
+      val nodeNames = adaptivePlan.collect { case c: CometSparkToColumnarExec 
=>
+        c.nodeName
+      }
+      assert(nodeNames.length == 1)
+      assert(nodeNames.head == "CometSparkRowToColumnar")

Review Comment:
   Could you also add a test that will generate a plan that uses 
`CometSparkColumnarToColumnar` so that we are testing both cases?
   
   I think you could have a copy of this test that writes the dataframe to a 
Parquet file and then reads the Parquet file back with the following configs. 
This will use Spark's vectorized Parquet reader which returns Spark columns.
   
   ```
           SQLConf.USE_V1_SOURCE_LIST.key -> "",
           CometConf.COMET_NATIVE_SCAN_ENABLED.key -> "false",
           CometConf.COMET_CONVERT_FROM_PARQUET_ENABLED.key -> "true") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] chore: Override node name for CometSparkToColumnar [datafusion-comet]

Reply via email to