mbutrovich commented on issue #1875:
URL: 
https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2967624631

   Spark 3.4 and 3.5 handle struct conversion in this test case differently. 
3.4 inserts a `cast` expression in the Project operator, while 3.5 used a 
`named_struct` expression.
   
   Spark 3.4:
   ```
   == Physical Plan ==
   CommandResult <empty>
      +- Execute InsertIntoHadoopFsRelationCommand 
file:/Users/matt/git/datafusion-comet/spark-warehouse/tab2, false, Parquet, 
[path=file:/Users/matt/git/datafusion-comet/spark-warehouse/tab2], Append, 
`spark_catalog`.`default`.`tab2`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/Users/matt/git/datafusion-comet/spark-warehouse/tab2),
 [p]
         +- WriteFiles
            +- *(1) Project [cast(s#3 as struct<b:string,a:string>) AS p#6]
               +- *(1) ColumnarToRow
                  +- FileScan parquet spark_catalog.default.tab1[s#3] Batched: 
true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/Users/matt/git/datafusion-comet/spark-warehouse/tab1], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<s:struct<a:string,b:string>>
   ```
   
   Spark 3.5:
   ```
   == Physical Plan ==
   CommandResult <empty>
      +- Execute InsertIntoHadoopFsRelationCommand 
file:/Users/matt/git/datafusion-comet/spark-warehouse/tab2, false, Parquet, 
[path=file:/Users/matt/git/datafusion-comet/spark-warehouse/tab2], Append, 
`spark_catalog`.`default`.`tab2`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/Users/matt/git/datafusion-comet/spark-warehouse/tab2),
 [p]
         +- WriteFiles
            +- *(1) Project [if (isnull(s#12)) null else named_struct(b, 
s#12.a, a, s#12.b) AS p#20]
               +- *(1) ColumnarToRow
                  +- FileScan parquet spark_catalog.default.tab1[s#12] Batched: 
true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/Users/matt/git/datafusion-comet/spark-warehouse/tab1], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<s:struct<a:string,b:string>>
   ```
   This test doesn't fail on Spark 3.5 with Comet, but that doesn't mean the 
cast logic is correct--we just don't get that expression in 3.5. Here's a 
succinct test case to show the cast issue still exists on 3.5:
   ```scala
     test("cast StructType to StructType with different names") {
       withTable("tab1") {
         sql("""
              |CREATE TABLE tab1 (s struct<a: string, b: string>)
              |USING parquet
            """.stripMargin)
         sql("INSERT INTO TABLE tab1 SELECT 
named_struct('col1','1','col2','2')")
         if (usingDataSourceExec) {
           checkSparkAnswerAndOperator(
             "SELECT CAST(s AS struct<field1:string, field2:string>) AS 
new_struct FROM tab1")
         } else {
           // Should just fall back to Spark since non-DataSoruceExec scan does 
not support nested types.
           checkSparkAnswer(
             "SELECT CAST(s AS struct<field1:string, field2:string>) AS 
new_struct FROM tab1")
         }
       }
     }
   ```
   I will work on a fix in cast.rs, and include the provided test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to