n0r0shi opened a new pull request, #11674:
URL: https://github.com/apache/incubator-gluten/pull/11674

    ## Summary
     - Adds `Sig[MonotonicallyIncreasingID]` to 
`ExpressionMappings.SCALAR_SIGS` so the function is offloaded to Velox instead 
of falling back to vanilla Spark.
     - Sets Velox's `expression.dedup_non_deterministic` to `false` to match 
Spark semantics — Spark never deduplicates non-deterministic expressions, each 
call has independent state.
     - Un-ignores and fixes the test in `ScalarFunctionsValidateSuite`.
   
     ## Context
     PR #10097 previously attempted this but was closed because of a result 
mismatch (#7628): `SELECT monotonically_increasing_id(), 
monotonically_increasing_id()` returned
   ```
     ┌─────┬───────┬───────┐
     │ Row │ Col 1 │ Col 2 │
     ├─────┼───────┼───────┤
     │ 0   │ 0     │ 2     │
     ├─────┼───────┼───────┤
     │ 1   │ 1     │ 3     │
     └─────┴───────┴───────┘
   ```
    instead of Spark's expected 
   ```
     ┌─────┬───────┬───────┐
     │ Row │ Col 1 │ Col 2 │
     ├─────┼───────┼───────┤
     │ 0   │ 0     │ 0     │
     ├─────┼───────┼───────┤
     │ 1   │ 1     │ 1     │
     └─────┴───────┴───────┘
   ```
   The root cause was Velox's expression compiler deduplicating the two 
structurally identical calls into one shared counter instance.
   
     Velox has since added the `expression.dedup_non_deterministic` config 
(facebookincubator/velox#15008) to control this behavior. This PR sets it to 
`false` for Gluten. This only affects non-deterministic expressions — 
deterministic expression deduplication is unchanged.
   
     **Question for reviewers:** Is setting `expression.dedup_non_deterministic 
= false` globally the right approach? An alternative would be conditionally 
disabling it only when stateful expressions like `monotonically_increasing_id` 
are detected in the plan, but we believe the global approach is correct since 
Spark semantics never deduplicate non-deterministic expressions.
   
     Closes #7628
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to