acvictor commented on PR #11609: URL: https://github.com/apache/incubator-gluten/pull/11609#issuecomment-3912693045
> @acvictor Thank you for the additional information. I’m still a bit unclear: in a plan like `op0 -> R2C -> op1`, if all plan nodes involving the timestamp_ntz type fall back, could you help clarify which operator op1 might be that would lead to an R2C being inserted? I added debug logs to Gluten and used the example of the test `use TIMESTAMP_NTZ in a partition column` from [here]( https://github.com/delta-io/delta/blob/04968443875e696df85e6e3dc9b18148eb50ad9f/spark/src/test/scala/org/apache/spark/sql/delta/DeltaTimestampNTZSuite.scala#L104). This test creates a table with schema c1 STRING, c2 TIMESTAMP, c3 TIMESTAMP_NTZ partitioned by c3, inserts a row, then calls spark.table("delta_test").head. op1 here would be ColumnarCollectLimitExec. The actual runtime plan is: VeloxColumnarToRowExec └── ColumnarCollectLimitExec - op1 └── RowToVeloxColumnarExec └── WholeStageCodegenExec - op0 (vanilla Spark, wraps FileScan fall back) └── ColumnarToRow └── FileScan parquet spark_catalog.default.delta_test [c1, c2, c3(TimestampNTZ)] PARTITIONED BY (c3) Debug logs added to Transitions.scala confirm it: ``` [TRANSITION-DEBUG] node: ColumnarCollectLimit [TRANSITION-DEBUG] conv: Impl(None$,VanillaBatchType$) -> Impl(Any,Is(VeloxBatchType$)) [TRANSITION-DEBUG] child: Scan parquet spark_catalog.default.delta_test [TRANSITION-DEBUG] new: RowToVeloxColumnar [TRANSITION-DEBUG] schema: StructType(...,StructField(c3,TimestampNTZType,true)) ``` ColumnarCollectLimitExec appears despite the FallbackByTimestampNTZ validator because it is registered as a post-transform rule, which runs after validation. It sees the vanilla CollectLimitExec with a columnar child and unconditionally replaces it with ColumnarCollectLimitExec bypassing the validator entirely. Then InsertTransitions sees a convention mismatch (VanillaBatch - VeloxBatch) and inserts the RowToVeloxColumnarExec, which throws an exception in SparkArrowUtil.toArrowSchema because there is no case for TimestampNTZType. The validator alone cannot handle this because post-transform rules like CollectLimitTransformerRule can reintroduce Gluten native operators after validation has already run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
