gortiz opened a new pull request, #18782: URL: https://github.com/apache/pinot/pull/18782
## Summary Adds an **opt-in, default-off** query option `unnestColumnPruning` for the multi-stage engine that prunes input/passthrough columns — notably the **unnested source array** — from the `UNNEST` output when nothing downstream references them. Today, for a query like: ```sql SELECT e.col1, u.s FROM e CROSS JOIN UNNEST(e.mcol1) AS u(s) ``` the `UnnestNode` output schema is the full Calcite Correlate row type `[col1, mcol1, s]`, and `UnnestOperator` copies the entire input row (including the source array `mcol1`) into **every** one of the N exploded rows — only for a parent `Project` to immediately drop `mcol1`. For large arrays this needlessly widens every intermediate row (and serializes the array N times when an exchange sits between the UNNEST and the projecting operator). The array expression is only needed in the operator's **input** (to evaluate the explode), never in its **output**, unless the user also selects it. With the flag on, the source array is no longer carried. ## How it works - **`UnnestNode`** carries a passthrough index map + `prunedPassthrough` flag. Legacy constructors default to *not pruned* (copy the whole input row), so existing behavior is unchanged. - **`RelToPlanNodeConverter`** fuses the pruning into `convertLogicalProject`: when a `Project` sits directly above a Correlate/Uncollect (no wrapping correlate-filter), it computes which left columns the project actually references, builds a pruned `UnnestNode` (smaller output schema + passthrough map + recomputed element/ordinality indexes), and remaps that one project's `InputRef`s. It falls back to the current behavior in every other shape, and converts the correlate at most once. - **`UnnestOperator`** honors the passthrough map, copying only retained columns (resolved to a primitive `int[]` so the per-row hot path stays allocation/box-free). When not pruned, it keeps the legacy whole-row `System.arraycopy`. ## Backward compatibility / rolling upgrades `UnnestNode` is serialized broker→server and the operator runs server-side, so this is a two-sided change. The proto fields (`passthroughInputIndexes`, `prunedPassthrough`) are **additive**: - **old broker → new server**: fields absent ⇒ `prunedPassthrough=false` ⇒ legacy "copy whole row". Safe. - **new broker → old server**: a smaller output schema would break an un-upgraded server. This is exactly why the option **defaults to off** — a new broker never emits the pruned schema until an operator explicitly enables it, which should only happen once the whole server fleet is upgraded. A future release can flip the default once a minimum server version is guaranteed. ## Tests - `UnnestSqlPlannerTest` — flag on/off, source-array-also-selected (no-op), WITH ORDINALITY, multiple arrays, zero-passthrough. - `PlanNodeSerDeTest` — protobuf round-trip for pruned (non-sequential indexes + ordinality) and legacy `UnnestNode`. - `UnnestOperatorTest` — pruned single-array, zero-passthrough, WITH ORDINALITY, multiple arrays; legacy path unchanged. - `UnnestIntegrationTest` — end-to-end pruned-vs-default result equality across single/multi array, WITH ORDINALITY, zero-passthrough, and array-also-selected shapes. ## Notes for reviewers - UNNEST is only supported on the logical planner path, so the change is confined to `RelToPlanNodeConverter` + the node/operator + serde; the V2 physical path is untouched. - Default-off; enabling it is a rolling-upgrade-ordering decision (servers before brokers). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
