andygrove commented on PR #4538:
URL: 
https://github.com/apache/datafusion-comet/pull/4538#issuecomment-4626837504

   ## Follow-up: make codegen-dispatch enrollment opt-in (commit 3a9ec2d41)
   
   While reviewing the codegen-dispatch wiring I found a soundness issue with 
how expressions were enrolled onto the dispatcher, and reworked it.
   
   ### The issue
   
   Routing an `Incompatible` result through the JVM codegen dispatcher was 
**opt-out**: `CometExpressionSerde.allowIncompatCodegenDispatch` defaulted to 
`true`, so *every* expression that reports `Incompatible` (and is not 
explicitly allowed via `allowIncompatible`) was routed through the dispatcher 
unless it remembered to override the flag to `false`. Only 5 serdes did so.
   
   That default silently enrolled a dozen-plus pre-existing native expressions 
that this PR never intended to touch. Checked against Spark source for real 
`doGenCode` (the dispatcher refuses `CodegenFallback`), the implicitly-enrolled 
set included `array_except`, `array_intersect`, `array_join`, `concat`, 
`from_utc_timestamp`, `to_utc_timestamp`, `convert_timezone`, and `sort_array`. 
None had deliberate test coverage, the change was invisible in review, and it 
was most concerning for the timezone-sensitive expressions where dispatch 
correctness depends on state surviving closure serialization.
   
   ### The new approach
   
   Enrollment is now **opt-in** via a marker trait:
   
   ```scala
   trait CodegenDispatchFallback { self: CometExpressionSerde[_] => }
   ```
   
   The `QueryPlanSerde` `Incompatible` branch routes through the dispatcher 
only when the handler mixes in `CodegenDispatchFallback`, otherwise it falls 
back to Spark. The `allowIncompatCodegenDispatch` flag and its 5 `= false` 
overrides are removed (those serdes simply do not opt in, so their behavior is 
unchanged).
   
   The 8 serdes this PR deliberately dispatches are enrolled explicitly, each 
backed by the test coverage added here:
   
   | Serde | Incompatible case routed to dispatch |
   | --- | --- |
   | `CometHour`, `CometMinute`, `CometSecond` | `TimestampNTZ` |
   | `CometFromUnixTime` | always |
   | `CometMapFromEntries` | `BinaryType` key/value |
   | `CometReverse` | binary-element arrays, non-UTF8_BINARY collation |
   | `CometTruncDate`, `CometTruncTimestamp` | non-literal format |
   
   The accidentally-enrolled expressions above revert to Spark fallback, 
matching `main`. The expressions that have no native implementation and extend 
`CometCodegenDispatch` (`hypot`, `bround`, `sequence`, `elt`, ...) are 
unaffected: they report `Compatible` and never reach this branch.
   
   ### Validation
   
   - `CometArrayExpressionSuite`, `CometMapExpressionSuite`, 
`CometTemporalExpressionSuite` all green (cover `reverse`/`map_from_entries` 
binary and `trunc`/`date_trunc` dispatch).
   - `CometSqlFileTestSuite` failures are identical to the pre-change branch 
(pre-existing, unrelated), so no regressions from this change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to