peter-toth commented on code in PR #10035:
URL:
https://github.com/apache/arrow-datafusion/pull/10035#discussion_r1563883588
##########
datafusion/common/src/tree_node.rs:
##########
@@ -43,17 +42,32 @@ macro_rules! handle_transform_recursion_up {
}};
}
-/// Defines a visitable and rewriteable tree node. This trait is implemented
-/// for plans ([`ExecutionPlan`] and [`LogicalPlan`]) as well as expression
-/// trees ([`PhysicalExpr`], [`Expr`]) in DataFusion.
+/// API for visiting (read only) and rewriting tree data structures .
+///
+/// Note: this trait is implemented for plans ([`ExecutionPlan`] and
+/// [`LogicalPlan`]) as well as expression trees ([`PhysicalExpr`], [`Expr`]).
+///
+/// # Common APIs:
+/// * [`Self::apply`] and [`Self::visit`] for inspecting (without
modification) borrowed nodes
+/// * [`Self::transform`] and [`Self::rewrite`] for rewriting owned nodes
+///
+/// # Terminology
+/// The following terms are used in this trait
+///
+/// * `f_down`: Invoked before any children of the current node are visited.
+/// * `f_up`: Invoked after all children of the current node are visited.
+/// * `f`: closure that is applied to the current node.
+/// * `map_*`: applies a transformation to rewrite owned nodes
+/// * `apply_*`: invokes a function on borrowed nodes
+/// * `transform_`: applies a transformation to rewrite owned nodes
Review Comment:
BTW, if we wanted to consolidate the `TreeNode` API terminonlogy I would
suggest the following:
| traversal order | inspecting | transforming |
| --- | --- | --- |
| order agnostic | `for_each()` (new API, an alias to `for_each_up()`, but
its docs shouldn't specify any order) | `transform()` (remains an alias to
`transform_up()` but its docs shouldn't specify any order) |
| top-down | `for_each_down()` (new API), `apply()` (deprecated, from now
just an alias to `for_each_down()`) | `transform_down()` (changes to `f: &mut
F` where `F: FnMut(Self) -> Result<Transformed<Self>>`, which is a breaking
change but requires a trivial fix from the users), `transform_down_mut()`
(deprecated, from now just an alias to `transform_down()`) |
| bottom-up | `for_each_up()` (new API) | `transform_up()` (changes to `f:
&mut F` where `F: FnMut(Self) -> Result<Transformed<Self>>`, which is a
breaking change but requires a trivial fix from the users),
`transform_up_mut()` (deprecated, from now just an alias to `transform_up()`) |
| combined with separete `f_down` and `f_up` closures | |
`transform_down_up()` |
| combined with incorporated `f_down()` and `f_up()` methods into an object
| `visit()` + `TreeNodeVisitor` | `rewrite()` + `TreeNodeRewriter` |
\+ Maybe rename `apply_children()` to `for_each_children()` and
`map_children()` to `transform_children()`. Although renaming these would be a
breaking change if something that is built on DataFusion defines a new kind of
`TreeNode`, but fixing them would also be trivial.
\+ `LogicalPlan::..._with_subqueries()` APIs should be alligned with the
above `TreeNode` ones, but as they are fairly new
(https://github.com/apache/arrow-datafusion/pull/9913) they haven't landed in
any release yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]