acvictor commented on PR #11609:
URL: 
https://github.com/apache/incubator-gluten/pull/11609#issuecomment-3912693045

   > @acvictor Thank you for the additional information. I’m still a bit 
unclear: in a plan like `op0 -> R2C -> op1`, if all plan nodes involving the 
timestamp_ntz type fall back, could you help clarify which operator op1 might 
be that would lead to an R2C being inserted?
   
   I added debug logs to Gluten and used the example of the test `use 
TIMESTAMP_NTZ in a partition column` from [here]( 
https://github.com/delta-io/delta/blob/04968443875e696df85e6e3dc9b18148eb50ad9f/spark/src/test/scala/org/apache/spark/sql/delta/DeltaTimestampNTZSuite.scala#L104).
 This test creates a table with schema c1 STRING, c2 TIMESTAMP, c3 
TIMESTAMP_NTZ partitioned by c3, inserts a row, then calls 
spark.table("delta_test").head.
   
     op1 here would be ColumnarCollectLimitExec.
   
     The actual runtime plan is:
   
      VeloxColumnarToRowExec
        └── ColumnarCollectLimitExec          - op1
              └── RowToVeloxColumnarExec      
                    └── WholeStageCodegenExec   - op0 (vanilla Spark, wraps 
FileScan fall back)
                          └── ColumnarToRow
                                └── FileScan parquet 
spark_catalog.default.delta_test
                                      [c1, c2, c3(TimestampNTZ)] PARTITIONED BY 
(c3)
   
     Debug logs added to Transitions.scala confirm it:
   
   ```
      [TRANSITION-DEBUG] node: ColumnarCollectLimit
      [TRANSITION-DEBUG]   conv: Impl(None$,VanillaBatchType$) -> 
Impl(Any,Is(VeloxBatchType$))
      [TRANSITION-DEBUG]   child: Scan parquet spark_catalog.default.delta_test
      [TRANSITION-DEBUG]   new: RowToVeloxColumnar
      [TRANSITION-DEBUG]   schema: 
StructType(...,StructField(c3,TimestampNTZType,true))
   ```
   ColumnarCollectLimitExec appears despite the FallbackByTimestampNTZ 
validator because it is registered as a post-transform rule, which runs after 
validation. It sees the vanilla CollectLimitExec with a columnar child and 
unconditionally replaces it with ColumnarCollectLimitExec bypassing the 
validator entirely. Then InsertTransitions sees a convention mismatch 
(VanillaBatch - VeloxBatch) and inserts the RowToVeloxColumnarExec, which 
throws an exception in SparkArrowUtil.toArrowSchema because there is no case 
for TimestampNTZType. The validator alone cannot handle this because 
post-transform rules like CollectLimitTransformerRule can reintroduce Gluten 
native operators after validation has already run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to