andygrove opened a new pull request, #4636:
URL: https://github.com/apache/datafusion-comet/pull/4636
## Which issue does this PR close?
Part of #4596 (the array group: `array_intersect`, `array_except`,
`array_join`).
## Rationale for this change
These three expressions report `Incompatible` (null handling and element
ordering can diverge from DataFusion's native implementation), so with
`allowIncompatible` unset they fall the whole projection back to Spark. They
all have a real Spark `doGenCode` and supported input/output types, so they are
eligible for the `CodegenDispatchFallback` path introduced in #4538: route the
`Incompatible` result through the JVM codegen dispatcher (Spark's own
`doGenCode` inside the Comet pipeline) so the projection stays native while
matching Spark exactly.
## What changes are included in this PR?
* `CometArrayIntersect`, `CometArrayExcept`, `CometArrayJoin` mix in
`CodegenDispatchFallback`. `ArrayIntersect`'s `Unsupported` collation case is
unchanged (still falls back); only the `Incompatible` case dispatches.
* `docs/source/user-guide/latest/expressions.md` notes updated: these now
route through the dispatcher by default, with the native incompatible path
opt-in via `allowIncompatible`.
The native opt-in path (`allowIncompatible=true`) is unchanged.
## How are these changes tested?
* `array_join.sql`: upgraded from `spark_answer_only` to `query` so it now
asserts native execution matching Spark, including the `array('a', NULL, 'c')`
null case.
* New `array_intersect_dispatch.sql` and `array_except_dispatch.sql`:
exercise the dispatch path with `allowIncompatible` unset over the exact inputs
the native path handles incompatibly (the right-longer-than-left ordering case
for intersect, and the literal/literal case for except that the native path
could not handle). Both assert native execution matching Spark with no
`sort_array` workaround.
* The existing `array_intersect.sql` / `array_except.sql` tests (native
`allowIncompatible=true` path) still pass.
All run with `CometSqlFileTestSuite` on Spark 3.5 and pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]