zhuqi-lucas opened a new pull request, #22728:
URL: https://github.com/apache/datafusion/pull/22728

   ## Which issue does this PR close?
   
   Closes #22616.
   
   ## Rationale for this change
   
   #22298 landed the in-place `Arc::make_mut` traversal (`map_children_mut` + 
`rewrite_plan_in_place`) but the driver only used it when 
`plan_has_subqueries(&plan) == false`. Two cases still went through the 
ownership-based path that destructures + rebuilds each `LogicalPlan` variant on 
every recursive layer:
   
   1. **Plans that contain subqueries** — the driver fell back to 
`rewrite_with_subqueries`, even on rules that just wanted to descend.
   2. **Rules with `apply_order = None`** (drive their own recursion: 
`EliminateCrossJoin`, `CommonSubexprEliminate`) — never reached the in-place 
path at all.
   
   Profile-guided motivation: `samply` on `sql_planner` showed ~17% of active 
CPU in `Vec / Box / Arc TreeNodeContainer` apply/map walks, dominated by these 
owership-based recursion layers. Per discussion on #22583 with @neilconway and 
@alamb the "right" fix is to make the in-place path generic so per-rule 
fast-paths like #22612 stop being necessary.
   
   ## What changes are included in this PR?
   
   Two pieces:
   
   ### 1. `map_children_and_subqueries_mut`
   
   Extends `map_children_mut` (currently only walks direct `Arc<LogicalPlan>` 
children) with a subquery-plan descent. Direct children stay on the in-place 
path (`Arc::make_mut`, zero-cost when refcount = 1); the subquery side briefly 
borrows the plan via `std::mem::take` to call ownership-based 
`LogicalPlan::map_subqueries`. `map_subqueries`'s per-node 
`has_subquery_expressions` fast-path keeps subquery-free nodes allocation-free, 
so the only allocation cost is at the (rare) nodes that actually carry `IN 
(SELECT ...)` / `EXISTS (...)` etc.
   
   With this, `rewrite_plan_in_place` no longer needs the `plan_has_subqueries` 
gate the driver was using. Removes:
   - `plan_has_subqueries` (now unused)
   - The `TreeNodeRewriter`-based `Rewriter` shim
   - The `has_subqueries ? rewrite_with_subqueries : rewrite_plan_in_place` 
branch in `Optimizer::optimize_internal`
   
   ### 2. `rewrite_children_in_place`
   
   In-place equivalent of the `map_uncorrelated_subqueries` + `map_children` 
pattern that rules with `apply_order = None` use to drive their own recursion. 
Two such rules in-tree, both converted:
   
   - `EliminateCrossJoin::rewrite_children`
   - `CommonSubexprEliminate::rewrite` (the no-special-case arm)
   
   ### Net effect
   
   When a rule returns `Transformed::no` for a child, the parent `LogicalPlan` 
variant is no longer destructured-and-rebuilt at every recursive layer — the 
child's `Arc` is reused via `Arc::make_mut`.
   
   ## Are these changes tested?
   
   - 711 optimizer unit tests pass (no count change).
   - SLT broad sweep — `subquery`, `cse`, `joins`, `predicates`, `aggregate`, 
`tpch` (16 files) — all pass.
   - `cargo clippy -p datafusion-optimizer --all-targets -- -D warnings` clean.
   
   ## Are there any user-facing changes?
   
   No semantic change. Pure perf optimization — no new APIs, no config knobs.
   
   ## Net diff
   
   -28 lines (the `Rewriter` + `plan_has_subqueries` removal together outweigh 
the two new helpers).
   
   ## Follow-up — #22612 fast-path can probably go
   
   `EliminateCrossJoin`'s `plan_has_joins` fast-path (#22612) was a stop-gap 
that compensated for the ownership-based recursion. With this PR, the no-op 
path is naturally cheap; #22612 likely becomes redundant. I'll measure on the 
bot before proposing the revert.
   
   ## Status: draft
   
   Drafting to get bot benchmark numbers. Expecting positive on `sql_planner` 
(`logical_plan_tpch_all`, `optimizer_tpch_all`, `physical_plan_tpch_all`) — the 
workloads where ownership-based recursion shows up most.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to