Dandandan opened a new pull request, #22389:
URL: https://github.com/apache/datafusion/pull/22389
## Summary
Long chains of `ProjectionExec`s (and their `LogicalPlan::Projection`
counterparts) — e.g. queries shaped like:
```sql
WITH s0 AS (SELECT ... FROM t LEFT JOIN ...),
s1 AS (SELECT *, <CASE-ladder over s0.col> AS d1 FROM s0),
s2 AS (SELECT *, <CASE-ladder over s0.col> AS d2 FROM s1),
...
sN AS (SELECT *, <CASE-ladder over s0.col> AS dN FROM s_{N-1})
SELECT * FROM sN
```
were being collapsed one level at a time. This forced **O(N) intermediate
`ProjectionExec` constructions** (each recomputing equivalence properties
through its projection mapping) on the physical side, and **O(N) re-runs of the
`OptimizeProjections` rule** by the outer optimizer fixpoint loop on the
logical side.
## Fix
**Physical** (`datafusion/physical-plan/src/projection.rs`):
- Replaced the pairwise recursion inside
`ProjectionExec::try_swapping_with_projection` with
`try_collapse_projection_chain`, which walks the entire run of consecutive
`ProjectionExec`s iteratively and builds **one** final `ProjectionExec` instead
of N-1 intermediates.
- The `is_projection_removable` / merge-is-beneficial guards from the old
`try_unifying_projections` are preserved at each step.
- Leaf pushdown into a non-`Projection` input (e.g. `DataSourceExec`) is
preserved by handing the unified projection back to
`remove_unnecessary_projections` once at the end.
**Logical** (`datafusion/optimizer/src/optimize_projections/mod.rs`):
- Wrapped `merge_consecutive_projections` in an internal loop so an N-deep
chain of `LogicalPlan::Projection` collapses in a single rule application.
## Benchmark
New `physical_plan_chained_case_projection_hotspot` bench in
`sql_planner_extended.rs`. Models the shape above: LEFT JOIN + 30-wide OR
filter + 80 chained CTE projections, each adding one column defined by a
depth-23 nested `CASE` ladder over the same input column.
| Config | Before | After | Δ |
|---|---:|---:|---|
| N=80, depth=23, OR=30 (hotspot) | 623 ms | **364 ms** | **−41.5%** (CI
[−41.83%, −41.33%], p<0.05) |
| N=80, depth=45 (parser ceiling) | (multi-second, was the original blowup)
| **~1.5 s** | — |
A small sweep group (`physical_plan_chained_case_projection_sweep`) covers
steps ∈ {10, 20, 40} × depth ∈ {5, 10, 15} for tracking regressions in this
shape over time.
## Verification
- `cargo test -p datafusion-physical-plan --lib projection` — 24/24 passing
- `cargo test -p datafusion-optimizer --lib optimize_projections` — 54/54
passing
- `cargo test -p datafusion --test core_integration
physical_optimizer::projection_pushdown` — 23/23 passing (no snapshots changed)
## Test plan
- [x] Existing projection unit tests
- [x] Existing physical optimizer integration tests
- [ ] CI runs the new bench
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]