zhuqi-lucas opened a new issue, #22616:
URL: https://github.com/apache/datafusion/issues/22616
### Is your feature request related to a problem or challenge?
A follow-up to #22583 / #22612.
Several recent optimizer perf PRs have added per-rule fast-paths to dodge
wasted work on plans that don't match the rule's shape:
- #22298 — `plan_has_subqueries` gate around `rewrite_with_subqueries`
- #22612 — `plan_has_joins` gate at the top of `EliminateCrossJoin::rewrite`
- Likely more to follow on other rules with similar costs
These are useful local wins but @neilconway and @alamb's comments on #22583
point at the deeper issue: **the ownership-based `TreeNode::rewrite` traversal
copies the Arc spine on every pass, even when the rule doesn't change
anything**. A per-rule "did anything here change?" scan is a stop-gap; the real
fix is to express tree rewrites in a way where untouched subtrees aren't
reallocated to begin with.
@adriangb already laid groundwork in #22298 with a private
`map_children_mut` helper in `datafusion/optimizer/src/optimizer.rs` that
mutates `Arc<LogicalPlan>` children in place via `Arc::make_mut` (zero-cost
when refcount == 1). That helper currently:
- Is only invoked from `rewrite_plan_in_place`, which itself is gated on
`!plan_has_subqueries(plan)` — so the in-place path only kicks in for the
simpler subquery-free case
- Doesn't recurse into the subquery plans referenced from
`Expr::ScalarSubquery` / `InSubquery` / `Exists` / `SetComparison`
- Is `pub(crate)` to the optimizer crate, not reachable from individual rules
### Describe the solution you'd like
Generalize the in-place rewriting infrastructure so that **all** logical
optimizer rules can opt into it, regardless of `ApplyOrder`:
1. **Extend `map_children_mut` to also walk subquery plans.** Add a sibling
`map_subquery_plans_mut` (or fold it into a single
`map_children_and_subqueries_mut`) so the in-place path works on plans that
contain `IN (SELECT ...)` / `EXISTS (...)` etc.
2. **Expose an in-place traversal entry point usable from rules with
`ApplyOrder::None`.** Today `EliminateCrossJoin` and a couple of other rules
drive their own recursion via `rewrite_children` (which uses ownership-based
`map_children` + `map_uncorrelated_subqueries`). Give them an in-place
equivalent so they can recurse without forcing Arc copies when nothing changes.
3. **Decide on the API shape.** Two candidates:
- Add an `OptimizerRule::rewrite_in_place(&mut LogicalPlan, ...) ->
Result<bool>` method alongside the existing by-value `rewrite`. Rules can opt
in; framework falls back to ownership-based otherwise.
- Or keep the by-value `rewrite` contract and have the framework do
`std::mem::take` + write-back transparently (similar to how
`rewrite_plan_in_place` already bridges in optimizer.rs:511).
4. **Once the framework is in place, remove the per-rule fast-paths.**
`plan_has_subqueries` and `plan_has_joins` become redundant when "no-op
rewrites" are already free.
### Describe alternatives you've considered
- **Keep adding per-rule fast-paths.** Easy, contained, and works — but
every new rule pays the same cost and we accumulate `plan_has_X` helpers.
Doesn't scale.
- **Tackle `EliminateCrossJoin` specifically** with in-place rewrite. Mixed
approach: ownership-based for the join restructuring (`flatten_join_inputs` +
`find_inner_join`), in-place for the non-join recursion. Doable (~300 LOC) but
it's a one-off — doesn't help any other rule.
### Additional context
- Discussion: #22583 (comments from @neilconway and @alamb)
- Stop-gap PRs: #22298, #22612
- The `map_children_mut` implementation lives in
`datafusion/optimizer/src/optimizer.rs:391`
- The gating use-site in `rewrite_plan_in_place` is at
`datafusion/optimizer/src/optimizer.rs:498`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]