Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

via GitHub Tue, 20 May 2025 10:39:21 -0700


andygrove commented on code in PR #1756:
URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098530916



##########
spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala:
##########
@@ -99,6 +100,56 @@ class CometFuzzTestSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
     }
   }
 
+  test("select column with default value") {
+    // This test fails in Spark's vectorized Parquet reader for DECIMAL(36,18) 
or BINARY default values.
+    withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
+      // This test relies on two tables: 1) t1 the Parquet file generated by 
ParquetGenerator with random values, and
+      // 2) t2 is a new table created with one column which we add a second 
column with different types and random values.
+      // We use the schema and values of t1 to simplify random value 
generation for the default column value in t2.
+      val df = spark.read.parquet(filename)
+      df.createOrReplaceTempView("t1")
+      for (col <- df.columns
+          .slice(1, 14)) { // All the primitive columns based on 
ParquetGenerator.makeParquetFile.

Review Comment:
   this code could break if we change the parquet generator in the future. It 
would probably be best to filter instead. Something like:
   
   ```scala
       val columns = df.schema.fields.filter(f => 
!isComplexType(f.dataType)).map(_.name)
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

Reply via email to