zhuqi-lucas opened a new issue, #22583:
URL: https://github.com/apache/datafusion/issues/22583

   ### Is your feature request related to a problem or challenge?
   
   `EliminateCrossJoin::rewrite` in 
`datafusion/optimizer/src/eliminate_cross_join.rs` is called on every plan 
during logical optimization. The rule's body only does real work when the root 
(or its `Filter` child) is an inner `Join`; in every other case it falls 
through to `rewrite_children`, which recurses into the plan, processes 
uncorrelated subqueries, and rewrites every direct child via `map_children` 
(clone-on-write), then calls `recompute_schema` on the way back.
   
   For plans with no joins anywhere in the tree, the rule still:
   - Walks the entire plan via `map_uncorrelated_subqueries` + `map_children`
   - Triggers schema recomputation if any node reported `transformed = true` 
(which happens spuriously on plans like `OneOf` extensions where `map_children` 
always counts as transformed)
   - Allocates fresh `LogicalPlan` nodes on the recursion
   
   This is paid by every query that goes through the logical optimizer 
pipeline, even simple ones like `SELECT ... FROM single_table WHERE ...`. 
Production profiling on a reference-cluster query server shows 
`EliminateCrossJoin` taking a non-trivial share of plan time on point-query 
workloads that have no joins at all.
   
   Related prior work that focused on the rule's `EliminateCrossJoin` hot-path 
itself (#10287, EPIC #9637) reduced allocation when the rule does fire, but did 
not add an early bail-out for the common "no joins in this plan" case.
   
   ### Describe the solution you'd like
   
   Add an `apply` (read-only) scan at the top of `rewrite` that bails out with 
`Transformed::no(plan)` when the tree contains no `LogicalPlan::Join` nodes. 
Equivalent shape to recent fast-paths added in the physical optimizer (e.g. 
#22521 for `ensure_distribution`):
   
   ```rust
   use datafusion_common::tree_node::TreeNodeRecursion;
   
   fn rewrite(&self, plan: LogicalPlan, config: &dyn OptimizerConfig) -> 
Result<Transformed<LogicalPlan>> {
       let mut has_join = false;
       plan.apply(|node| {
           if matches!(node, LogicalPlan::Join(_)) {
               has_join = true;
               Ok(TreeNodeRecursion::Stop)
           } else {
               Ok(TreeNodeRecursion::Continue)
           }
       })?;
       if !has_join {
           return Ok(Transformed::no(plan));
       }
   
       // existing logic unchanged below ...
   }
   ```
   
   `apply` is read-only; it does not clone-on-write the plan and short-circuits 
on the first match. For no-join plans the whole rule then costs one O(N) 
read-only walk. For plans that contain a join the behavior is unchanged.
   
   ### Describe alternatives I've considered
   
   - **Leave it as-is.** The rule is correct, just wasteful on no-join plans. 
Acceptable for clusters whose workload is join-heavy, costly for clusters whose 
workload is mostly point queries on single tables.
   - **Move the check into the `OptimizerRule` dispatcher.** Would need a 
generic "does this rule care about this plan" hook on the trait. Larger surface 
and not clearly worth it for one rule.
   
   ### Additional context
   
   Same pattern of "logical optimizer rule walks the full plan tree even when 
there is nothing to rewrite" likely exists for other rules. If this fast-path 
lands cleanly I am happy to audit the remaining logical-optimizer rules (e.g. 
`DecorrelatePredicateSubquery`, `ExtractEquijoinPredicate`) and submit 
follow-ups in the same shape.
   
   Happy to send the PR. Tagging @2010YOUY01 @alamb who reviewed the analogous 
#22521 fast-path on the physical side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to