andygrove opened a new pull request, #4270:
URL: https://github.com/apache/datafusion-comet/pull/4270

   ## Which issue does this PR close?
   
   Closes #4269.
   
   ## Rationale for this change
   
   Comet already supports the `explode` generator for array inputs but rejects 
`posexplode` and `posexplode_outer`, forcing a fallback to Spark for any query 
that needs the position column. This PR closes that gap so workloads that use 
the `posexplode(arr) AS (pos, col)` pattern (or the SQL `LATERAL VIEW 
posexplode(...)` form) stay native.
   
   ## What changes are included in this PR?
   
   - `CometExplodeExec` (Scala serde) now accepts both `Explode` and 
`PosExplode` Catalyst nodes and propagates a new `position` flag through the 
operator proto.
   - `Explode` proto message gets a `bool position` field (tag 4).
   - The native planner builds a parallel `List<Int32>` positions column from a 
new `ListPositionsExpr` and passes it to DataFusion's `UnnestExec` as a second 
`ListUnnest` entry. Output schema injects `pos: Int32 not null` before the 
unnested element.
   - `ListPositionsExpr` 
(`native/core/src/execution/expressions/list_positions.rs`) reuses the input 
`ListArray` offsets and null bitmap, populating each row with `[0..len-1]`. 
Sharing offsets with the source array is what lets `UnnestExec` unnest both 
columns in parallel without alignment drift.
   - Maps still fall back (#2837) and `outer=true` keeps its Incompatible 
status (DataFusion [#19053](https://github.com/apache/datafusion/issues/19053)).
   - Hand-written notes in `docs/source/user-guide/latest/operators.md` and 
`understanding-comet-plans.md` updated.
   
   The `implement-comet-expression` and `audit-comet-expression` skills were 
used to scaffold the implementation and follow-up coverage.
   
   ## How are these changes tested?
   
   `CometGenerateExecSuite` was extended with 12 new tests covering:
   
   - `posexplode` over arrays (simple, empty, null, strings, nullable elements)
   - `posexplode_outer` over a simple array (`allowIncompat=true`)
   - multiple projected columns alongside the generator
   - map input falls back with the expected reason
   - `posexplode` over an array of structs (with field projection out of the 
unnested struct)
   - `posexplode` in a SQL `LATERAL VIEW`
   - `posexplode` of a literal array
   - `posexplode` across batch boundaries with `spark.comet.batchSize=4` to 
verify the parallel positions/values lists stay aligned across batches
   
   The existing `CometGenerateExecSuite` continues to pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to