andygrove opened a new issue, #4596:
URL: https://github.com/apache/datafusion-comet/issues/4596

   ### Background
   
   PR #4538 introduced an opt-in `CodegenDispatchFallback` marker trait. When 
an expression reports `Incompatible` and the user has not enabled 
`allowIncompatible`, mixing in the trait routes it through the JVM codegen 
dispatcher (Spark's own `doGenCode` inside the Comet pipeline) so the 
projection stays native while matching Spark exactly, instead of falling the 
whole projection back to Spark.
   
   Before that, enrollment was opt-out, which silently routed a number of 
native `Incompatible` expressions through the dispatcher without deliberate 
review or test coverage. Those were reverted to Spark fallback (matching 
`main`) in PR #4538. This issue tracks deliberately opting them back in, one at 
a time, each with explicit test coverage proving the dispatched path matches 
Spark.
   
   ### Candidate expressions
   
   These report `Incompatible` for some inputs, have a real Spark `doGenCode` 
(so the dispatcher can accept them), and currently fall back to Spark:
   
   - [ ] `concat` (`CometConcat`) - non-UTF8_BINARY collation
   - [ ] `array_intersect` (`CometArrayIntersect`)
   - [ ] `array_except` (`CometArrayExcept`)
   - [ ] `array_join` (`CometArrayJoin`)
   - [ ] `from_utc_timestamp` (`CometFromUTCTimestamp`)
   - [ ] `to_utc_timestamp` (`CometToUTCTimestamp`)
   - [ ] `convert_timezone` (`CometConvertTimezone`)
   - [ ] `sort_array` (`CometSortArray`) - verify dispatcher eligibility first; 
this one may be `CodegenFallback` in Spark, in which case the dispatcher 
refuses it and it should stay on Spark fallback
   
   ### What to do per expression
   
   1. Confirm the dispatcher accepts it (not `CodegenFallback`, supported 
input/output types per `CometBatchKernelCodegen.canHandle`). If refused, leave 
it as Spark fallback and note it.
   2. Mix `CodegenDispatchFallback` into the serde that is **registered** in 
`QueryPlanSerde` (not a delegate; see how `Reverse` registers `CometReverse`, 
which delegates to `CometArrayReverse`).
   3. Add tests asserting native execution that matches Spark for the 
`Incompatible` inputs (`checkSparkAnswerAndOperator` / `query` in SQL file 
tests, replacing the `expect_fallback` assertion).
   4. For the timezone-sensitive expressions, test across multiple session time 
zones, since dispatch correctness depends on the resolved `timeZoneId` 
surviving closure serialization.
   
   ### Notes
   
   The expressions that have no native implementation and extend 
`CometCodegenDispatch` (`hypot`, `bround`, `sequence`, `elt`, ...) are out of 
scope: they report `Compatible` and already dispatch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to