alamb commented on code in PR #22298:
URL: https://github.com/apache/datafusion/pull/22298#discussion_r3261881015
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -357,6 +360,95 @@ impl TreeNodeRewriter for Rewriter<'_> {
}
}
+/// Rewrites a plan tree in place using `Arc::make_mut` for
+/// copy-on-write semantics on `Arc<LogicalPlan>` children.
+///
+/// This avoids the `Arc::unwrap_or_clone` + `Arc::new` cycle that the
+/// ownership-based `TreeNode::rewrite` performs at every child node.
+/// When the `Arc` refcount is 1 (always true in the optimizer),
+/// `Arc::make_mut` is essentially free.
+///
+/// The `rule.rewrite()` API takes ownership, so we bridge the `&mut` to an
Review Comment:
this is an implementation detail -- I think it belongs next to the code below
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -357,6 +360,95 @@ impl TreeNodeRewriter for Rewriter<'_> {
}
}
+/// Rewrites a plan tree in place using `Arc::make_mut` for
+/// copy-on-write semantics on `Arc<LogicalPlan>` children.
+///
+/// This avoids the `Arc::unwrap_or_clone` + `Arc::new` cycle that the
+/// ownership-based `TreeNode::rewrite` performs at every child node.
+/// When the `Arc` refcount is 1 (always true in the optimizer),
Review Comment:
Why is this always true in the optimizer?
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -357,6 +360,95 @@ impl TreeNodeRewriter for Rewriter<'_> {
}
}
+/// Rewrites a plan tree in place using `Arc::make_mut` for
+/// copy-on-write semantics on `Arc<LogicalPlan>` children.
+///
+/// This avoids the `Arc::unwrap_or_clone` + `Arc::new` cycle that the
+/// ownership-based `TreeNode::rewrite` performs at every child node.
+/// When the `Arc` refcount is 1 (always true in the optimizer),
+/// `Arc::make_mut` is essentially free.
+///
+/// The `rule.rewrite()` API takes ownership, so we bridge the `&mut` to an
+/// owned value with [`std::mem::take`]. `LogicalPlan::default()` is a cheap
+/// empty placeholder (shared empty schema, no allocation) and is overwritten
+/// with the rule's output on the very next line.
+#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
+fn rewrite_plan_in_place(
+ plan: &mut LogicalPlan,
+ apply_order: ApplyOrder,
+ rule: &dyn OptimizerRule,
+ config: &dyn OptimizerConfig,
+) -> Result<bool> {
+ // f_down phase
+ let mut changed = false;
+ if apply_order == ApplyOrder::TopDown {
+ let owned = std::mem::take(plan);
+ let result = rule.rewrite(owned, config)?;
+ *plan = result.data;
+ changed |= result.transformed;
+ // Respect TreeNodeRecursion::Stop/Jump from the rule
+ if result.tnr == TreeNodeRecursion::Stop {
+ return Ok(changed);
+ }
+ }
+
+ // Recurse into children using Arc::make_mut (zero-cost when refcount == 1)
+ changed |= plan.map_children_mut(|child| {
+ rewrite_plan_in_place(child, apply_order, rule, config)
+ })?;
+
+ // f_up phase
+ if apply_order == ApplyOrder::BottomUp {
+ let owned = std::mem::take(plan);
+ let result = rule.rewrite(owned, config)?;
+ *plan = result.data;
+ changed |= result.transformed;
+ }
+
+ Ok(changed)
+}
+
+/// Returns true if the plan contains any subquery expressions
+/// (EXISTS, IN subquery, scalar subquery, set comparison).
+///
+/// Used to determine whether the more expensive `rewrite_with_subqueries`
+/// traversal is needed. When the plan has no subqueries, the cheaper
+/// `rewrite` traversal is sufficient since all plan nodes are reachable
+/// via direct children.
+fn plan_has_subqueries(plan: &LogicalPlan) -> bool {
Review Comment:
it is unfortunate to have to walk the tree twice, but that does seem better
than copying it in non subquery cases
It isn't clear to me why we couldn't do a similar "rewrite in place" trick
with Expr / PhysicalExprs as well, to avoid the owned rewrite path entirely 🤔
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -357,6 +360,95 @@ impl TreeNodeRewriter for Rewriter<'_> {
}
}
+/// Rewrites a plan tree in place using `Arc::make_mut` for
+/// copy-on-write semantics on `Arc<LogicalPlan>` children.
+///
+/// This avoids the `Arc::unwrap_or_clone` + `Arc::new` cycle that the
+/// ownership-based `TreeNode::rewrite` performs at every child node.
+/// When the `Arc` refcount is 1 (always true in the optimizer),
+/// `Arc::make_mut` is essentially free.
+///
+/// The `rule.rewrite()` API takes ownership, so we bridge the `&mut` to an
+/// owned value with [`std::mem::take`]. `LogicalPlan::default()` is a cheap
+/// empty placeholder (shared empty schema, no allocation) and is overwritten
+/// with the rule's output on the very next line.
+#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
+fn rewrite_plan_in_place(
+ plan: &mut LogicalPlan,
+ apply_order: ApplyOrder,
+ rule: &dyn OptimizerRule,
+ config: &dyn OptimizerConfig,
+) -> Result<bool> {
+ // f_down phase
+ let mut changed = false;
+ if apply_order == ApplyOrder::TopDown {
+ let owned = std::mem::take(plan);
+ let result = rule.rewrite(owned, config)?;
Review Comment:
If this returns early, the plan is left with the default `LogicalPlan` -- I
think we should make it clear in the comments what the expected semantics are
when the rule fails (aka the tree is left in an inconsistent / errored state)
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -357,6 +360,95 @@ impl TreeNodeRewriter for Rewriter<'_> {
}
}
+/// Rewrites a plan tree in place using `Arc::make_mut` for
+/// copy-on-write semantics on `Arc<LogicalPlan>` children.
+///
+/// This avoids the `Arc::unwrap_or_clone` + `Arc::new` cycle that the
+/// ownership-based `TreeNode::rewrite` performs at every child node.
+/// When the `Arc` refcount is 1 (always true in the optimizer),
+/// `Arc::make_mut` is essentially free.
+///
+/// The `rule.rewrite()` API takes ownership, so we bridge the `&mut` to an
+/// owned value with [`std::mem::take`]. `LogicalPlan::default()` is a cheap
+/// empty placeholder (shared empty schema, no allocation) and is overwritten
+/// with the rule's output on the very next line.
+#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
+fn rewrite_plan_in_place(
+ plan: &mut LogicalPlan,
+ apply_order: ApplyOrder,
+ rule: &dyn OptimizerRule,
+ config: &dyn OptimizerConfig,
+) -> Result<bool> {
+ // f_down phase
+ let mut changed = false;
+ if apply_order == ApplyOrder::TopDown {
+ let owned = std::mem::take(plan);
+ let result = rule.rewrite(owned, config)?;
+ *plan = result.data;
+ changed |= result.transformed;
+ // Respect TreeNodeRecursion::Stop/Jump from the rule
+ if result.tnr == TreeNodeRecursion::Stop {
+ return Ok(changed);
+ }
+ }
+
+ // Recurse into children using Arc::make_mut (zero-cost when refcount == 1)
+ changed |= plan.map_children_mut(|child| {
Review Comment:
I wonder if we could eventually consider making a TreeNode API that uses the
same trick to mutate the plans in place, rather than forcing a copy
##########
datafusion/expr/src/logical_plan/tree_node.rs:
##########
@@ -393,6 +394,113 @@ macro_rules! handle_transform_recursion {
}
impl LogicalPlan {
+ /// Applies `f` to each child (input) of this plan node in place,
+ /// using [`Arc::make_mut`] for copy-on-write semantics.
+ ///
+ /// When the `Arc` refcount is 1 (the common case in the optimizer),
+ /// `Arc::make_mut` returns a `&mut` reference without cloning.
+ /// When the refcount is >1, it clones the inner value first.
+ ///
+ /// Returns `Ok(true)` if any child was modified by `f`.
+ pub fn map_children_mut<F: FnMut(&mut Self) -> Result<bool>>(
Review Comment:
It would be great if we could move this into TreeNode eventually
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]