felipepessoto opened a new pull request, #12292:
URL: https://github.com/apache/gluten/pull/12292

   ## What changes are proposed in this pull request?
   
   Uncovered by the Delta Spark UT pipeline (#12278).
   
   `GlutenDeltaJobStatsTracker` builds the per-file statistics aggregation as a 
`SortAggregateExec -> ProjectExec` plan, runs Gluten's `HeuristicTransform`, 
then unconditionally casts the result to a `WholeStageTransformer`. When the 
statistics aggregation cannot be offloaded to Velox -- for example `min`/`max` 
over a `TIMESTAMP_NTZ` column, as exercised by Delta's 
`DataSkippingDeltaV1Suite` "data skipping on TIMESTAMP_NTZ near Long.MaxValue" 
-- the projection stays a vanilla `ProjectExec` and the cast throws:
   
   ```
   java.lang.ClassCastException: org.apache.spark.sql.execution.ProjectExec 
cannot be cast to org.apache.gluten.execution.WholeStageTransformer
   ```
   
   in the per-task tracker constructor (`GlutenDeltaJobStatsTracker.scala`), 
failing the write.
   
   This PR decides on the **driver** whether the aggregation actually offloads: 
a new `canOffloadStats()` dry-runs the same transform pipeline once and checks 
whether it collapses into a `WholeStageTransformer`. If it does not, the 
`DeltaJobStatisticsTracker` is routed to the existing 
`GlutenDeltaJobStatsFallbackTracker` (columnar-to-row + the original Delta 
tracker, which produces correct statistics for any type) instead of the native 
tracker. Evaluating this on the driver also avoids the per-task constructor 
allocating a single-thread executor and a `NativePlanEvaluator` before the 
cast. The fix is applied to both the Delta 3.x (`src-delta33`) and Delta 4.x 
(`src-delta40`) copies.
   
   ## How was this patch tested?
   
   Added `GlutenDeltaStatsSuite`, which writes a Delta table whose 
`TIMESTAMP_NTZ` min/max statistics cannot be offloaded to Velox. Before this 
change the write crashes with the `ClassCastException` above; after it, the 
write succeeds via the row-based fallback tracker.
   
   Locally verified (Spark 3.5, Scala 2.12): the new suite fails without the 
fix (`Tests: succeeded 0, failed 1`, ClassCastException) and passes with it 
(`succeeded 1, failed 0`). A companion test-only PR demonstrates the same 
red/green contrast on CI. Also confirmed end-to-end against Delta's 
`DataSkippingDeltaV1Suite` "TIMESTAMP_NTZ near Long.MaxValue" (succeeded 2, 
failed 0 with the fix).
   
   `scalafmt`/spotless report no changes.
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: GitHub Copilot CLI (claude-opus-4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to