gortiz opened a new pull request, #18782:
URL: https://github.com/apache/pinot/pull/18782

   ## Summary
   
   Adds an **opt-in, default-off** query option `unnestColumnPruning` for the 
multi-stage engine that prunes input/passthrough columns — notably the 
**unnested source array** — from the `UNNEST` output when nothing downstream 
references them.
   
   Today, for a query like:
   
   ```sql
   SELECT e.col1, u.s FROM e CROSS JOIN UNNEST(e.mcol1) AS u(s)
   ```
   
   the `UnnestNode` output schema is the full Calcite Correlate row type 
`[col1, mcol1, s]`, and `UnnestOperator` copies the entire input row (including 
the source array `mcol1`) into **every** one of the N exploded rows — only for 
a parent `Project` to immediately drop `mcol1`. For large arrays this 
needlessly widens every intermediate row (and serializes the array N times when 
an exchange sits between the UNNEST and the projecting operator).
   
   The array expression is only needed in the operator's **input** (to evaluate 
the explode), never in its **output**, unless the user also selects it. With 
the flag on, the source array is no longer carried.
   
   ## How it works
   
   - **`UnnestNode`** carries a passthrough index map + `prunedPassthrough` 
flag. Legacy constructors default to *not pruned* (copy the whole input row), 
so existing behavior is unchanged.
   - **`RelToPlanNodeConverter`** fuses the pruning into 
`convertLogicalProject`: when a `Project` sits directly above a 
Correlate/Uncollect (no wrapping correlate-filter), it computes which left 
columns the project actually references, builds a pruned `UnnestNode` (smaller 
output schema + passthrough map + recomputed element/ordinality indexes), and 
remaps that one project's `InputRef`s. It falls back to the current behavior in 
every other shape, and converts the correlate at most once.
   - **`UnnestOperator`** honors the passthrough map, copying only retained 
columns (resolved to a primitive `int[]` so the per-row hot path stays 
allocation/box-free). When not pruned, it keeps the legacy whole-row 
`System.arraycopy`.
   
   ## Backward compatibility / rolling upgrades
   
   `UnnestNode` is serialized broker→server and the operator runs server-side, 
so this is a two-sided change. The proto fields (`passthroughInputIndexes`, 
`prunedPassthrough`) are **additive**:
   
   - **old broker → new server**: fields absent ⇒ `prunedPassthrough=false` ⇒ 
legacy "copy whole row". Safe.
   - **new broker → old server**: a smaller output schema would break an 
un-upgraded server. This is exactly why the option **defaults to off** — a new 
broker never emits the pruned schema until an operator explicitly enables it, 
which should only happen once the whole server fleet is upgraded. A future 
release can flip the default once a minimum server version is guaranteed.
   
   ## Tests
   
   - `UnnestSqlPlannerTest` — flag on/off, source-array-also-selected (no-op), 
WITH ORDINALITY, multiple arrays, zero-passthrough.
   - `PlanNodeSerDeTest` — protobuf round-trip for pruned (non-sequential 
indexes + ordinality) and legacy `UnnestNode`.
   - `UnnestOperatorTest` — pruned single-array, zero-passthrough, WITH 
ORDINALITY, multiple arrays; legacy path unchanged.
   - `UnnestIntegrationTest` — end-to-end pruned-vs-default result equality 
across single/multi array, WITH ORDINALITY, zero-passthrough, and 
array-also-selected shapes.
   
   ## Notes for reviewers
   
   - UNNEST is only supported on the logical planner path, so the change is 
confined to `RelToPlanNodeConverter` + the node/operator + serde; the V2 
physical path is untouched.
   - Default-off; enabling it is a rolling-upgrade-ordering decision (servers 
before brokers).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to