Xtpacz opened a new pull request, #12261:
URL: https://github.com/apache/gluten/pull/12261

   Fix: https://github.com/apache/gluten/issues/12260
   
   ## What changes are proposed in this pull request?
   
   
   `CheckOverflowTransformer` reads `original.child.dataType` to decide whether 
to insert a cast. For `BinaryArithmetic`, Spark's `.dataType` returns 
`left.dataType` rather than the arithmetic result type. After child 
transformers apply rescale optimizations, the actual output type may differ 
from the Spark-declared type, and the cast is wrongly skipped.
   
   The resulting substrait plan has decimal types that mismatch function 
signatures. Velox's SimpleFunction validation rejects it, and 
`ColumnarPartialProjectRule` falls the entire Project back to JVM. Result is 
correct (via fallback) but native acceleration is lost.
   
   ### Reproducer
   ```sql
   CREATE TABLE t1 (val BIGINT) USING parquet;
   CREATE TABLE t2 (val BIGINT) USING parquet;
   INSERT INTO t1 VALUES (200);
   INSERT INTO t2 VALUES (100), (100), (100), (100), (100);
   
   SELECT
       a.val,
       (a.val - COALESCE(SUM(b.val), 0) / 5.0)
           / (COALESCE(SUM(b.val), 0) / 5.0) AS growth_rate
   FROM t1 a CROSS JOIN t2 b
   GROUP BY a.val;
   ```
   
   
   ### Root cause:
   
https://github.com/apache/gluten/blob/fc90a7933afaf5518ad20a60a5d79d482cea5ef1/gluten-substrait/src/main/scala/org/apache/gluten/expression/UnaryExpressionTransformer.scala#L90
   this read the Spark expression's declared type instead of the transformer's 
actual output type.
   
   ### Fix
   
   ```diff
   - original.child.dataType,
   + child.dataType,
   ```
   
   
   
   ## How was this patch tested?
   
   
     **Before fix** — Project falls back to JVM:
   
     ```
     == Final Plan ==
     * Project (17)                                         ← JVM, codegen id=3
     +- VeloxColumnarToRow (16)                             ← extra C2R 
conversion
        +- ^ RegularHashAggregateExecTransformer (14)
           +- ^ VeloxBroadcastNestedLoopJoinExecTransformer (13)
              :- ^ InputIteratorTransformer (7)
              :  +- BroadcastQueryStage (5)
              :     +- ColumnarBroadcastExchange (4)
              :        +- RowToVeloxColumnar (3)
              :           +- * ColumnarToRow (2)
              :              +- BatchScan (1)
              +- ^ InputIteratorTransformer (12)
                 +- RowToVeloxColumnar (10)
                    +- * ColumnarToRow (9)
                       +- BatchScan (8)
     ```
   
     **After fix** — Project runs natively in Velox:
   
     ```
     == Final Plan ==
     VeloxColumnarToRow (17)
     +- ^ ProjectExecTransformer (15)                       ← native Velox 
Project
        +- ^ RegularHashAggregateExecTransformer (14)
           +- ^ VeloxBroadcastNestedLoopJoinExecTransformer (13)
              :- ^ InputIteratorTransformer (7)
              :  +- BroadcastQueryStage (5)
              :     +- ColumnarBroadcastExchange (4)
              :        +- RowToVeloxColumnar (3)
              :           +- * ColumnarToRow (2)
              :              +- BatchScan (1)
              +- ^ InputIteratorTransformer (12)
                 +- RowToVeloxColumnar (10)
                    +- * ColumnarToRow (9)
                       +- BatchScan (8)
     ```
   
     Key differences:
   
     - Node (15) changes from `Project` (JVM, `*` = codegen) to 
`ProjectExecTransformer` (Velox native, `^` = transformer)
     - `VeloxColumnarToRow` moves from **before** Project (forced conversion to 
feed JVM) to **after** Project (deferred until output)
     - Aggregate→Project pipeline stays in Velox without breaking
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to