zhuqi-lucas opened a new pull request, #22612:
URL: https://github.com/apache/datafusion/pull/22612
## Which issue does this PR close?
Closes #22583.
## Rationale for this change
`EliminateCrossJoin::rewrite` is called on every plan during logical
optimization. The rule's body only does real work when the root (or its
`Filter` child) is an inner `Join`; in every other case it falls through to
`rewrite_children`, which recurses into the plan, processes uncorrelated
subqueries, and rewrites every direct child via `map_children`
(clone-on-write), then calls `recompute_schema` on the way back.
This is paid by every query in the logical optimizer pipeline — including
simple point queries with no joins anywhere in the tree.
## Discussion on the issue
@neilconway raised the valid concern that a fast-path scan still does *some*
up-front work in the case where the rewrite does fire, and that the deeper fix
is mutable tree rewrites (avoiding the clone-on-write of `TreeNode::rewrite`
entirely). @alamb agreed and pointed at the in-place `map_children_mut` /
`plan_has_subqueries` infrastructure adriangb landed in #22298 as the existing
precedent.
This PR follows that precedent directly:
- **Same shape as `plan_has_subqueries`** — a read-only `apply` scan,
early-stops on the first matching node, allocates nothing.
- The scan cost on a query that *does* have joins is O(depth-to-first-join)
— typically a handful of nodes, well below the cost of even one `map_children`
clone-on-write the rewrite would otherwise do.
- For the deeper "in-place mutable rewrite" direction, `rewrite_children`
here recurses via `optimizer.rewrite(input, config)` per child — a different
shape from `map_children_mut`'s `&mut` traversal. Adapting that is a larger
refactor and worth its own follow-up; this PR doesn't block it.
## What changes are included in this PR?
- New `plan_has_joins(&LogicalPlan) -> bool` helper in
`eliminate_cross_join.rs` — `apply` walk that returns `true` on the first
`LogicalPlan::Join` it sees.
- Fast-path at the top of `EliminateCrossJoin::rewrite`: `if
!plan_has_joins(&plan) { return Ok(Transformed::no(plan)); }`. Everything else
is unchanged.
## Are these changes tested?
Four new unit tests in the existing `mod tests`:
- `plan_has_joins_detects_root_join`
- `plan_has_joins_detects_nested_join` (Join under Filter/Projection)
- `plan_has_joins_returns_false_for_join_free_plan`
- `rewrite_short_circuits_when_plan_has_no_joins` — end-to-end: rule's
`rewrite` returns `Transformed::no` and the plan comes back identical (schema +
display) on join-free input.
The existing 20 `EliminateCrossJoin` tests + the full 708-test
`datafusion-optimizer --lib` suite still pass. `cargo clippy -p
datafusion-optimizer --all-targets -- -D warnings` clean.
## Are there any user-facing changes?
No semantic change. Pure perf optimization, no new config knobs.
## Follow-ups
- A deeper architectural improvement (mutable tree rewrites following the
`map_children_mut` pattern from #22298) is worth considering — see issue #22583
comments for discussion. Out of scope here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]