peter-toth commented on code in PR #10035:
URL:
https://github.com/apache/arrow-datafusion/pull/10035#discussion_r1562648720
##########
datafusion/common/src/tree_node.rs:
##########
@@ -43,17 +42,32 @@ macro_rules! handle_transform_recursion_up {
}};
}
-/// Defines a visitable and rewriteable tree node. This trait is implemented
-/// for plans ([`ExecutionPlan`] and [`LogicalPlan`]) as well as expression
-/// trees ([`PhysicalExpr`], [`Expr`]) in DataFusion.
+/// API for visiting (read only) and rewriting tree data structures .
+///
+/// Note: this trait is implemented for plans ([`ExecutionPlan`] and
+/// [`LogicalPlan`]) as well as expression trees ([`PhysicalExpr`], [`Expr`]).
+///
+/// # Common APIs:
+/// * [`Self::apply`] and [`Self::visit`] for inspecting (without
modification) borrowed nodes
+/// * [`Self::transform`] and [`Self::rewrite`] for rewriting owned nodes
+///
+/// # Terminology
+/// The following terms are used in this trait
+///
+/// * `f_down`: Invoked before any children of the current node are visited.
+/// * `f_up`: Invoked after all children of the current node are visited.
+/// * `f`: closure that is applied to the current node.
+/// * `map_*`: applies a transformation to rewrite owned nodes
+/// * `apply_*`: invokes a function on borrowed nodes
+/// * `transform_`: applies a transformation to rewrite owned nodes
Review Comment:
Previous efforts (especially
https://github.com/apache/arrow-datafusion/pull/8891 but others too:
https://github.com/apache/arrow-datafusion/pull/9876,
https://github.com/apache/arrow-datafusion/pull/9913,
https://github.com/apache/arrow-datafusion/pull/9965,
https://github.com/apache/arrow-datafusion/pull/9999) consolidated the
behaviour of existing APIs (consistent `TreeNodeRecursion` behaviour and common
`Transformed` return types of transforming APIs) but didn't try to normalize
naming of the APIs in favour of causing the least incompatibility with previous
DataFusion versions. So this means that the current API names evolved
organically and thus they might not be intuitive in all cases and are a bit
inconsistent here and there...
Basically, there are 2 kinds of `TreeNode` APIs:
1. "Inspecting" APIs to taverse over a tree and inspect `&TreeNode`s.
2. "Transforming" APIs to traverse over a tree while consuming `TreeNode`s
and produce possibly changed `TreeNode`s
(i.e. there is no in-place mutation API, that uses `&mut TreeNode`, but we
are improving the consume and produce APIs to cleate less new objects
unnecessarly e.g.: https://github.com/apache/arrow-datafusion/pull/9999)
There are only 2 inspecting `TreeNode` APIs:
- `TreeNode::apply()`:
Top-down (pre-order) traverses the nodes and applies the povided `f`
closure on the nodes.
- As `TreeNode` trees are hierarchical collections of nodes, something
like `foreach()` would better describe the operation.
- Because the API name doesn't explicitely defines the traversal direction
IMO its docs shouldn't say anything about its implementation just the fact that
it should be used for order agnostic inspection of nodes.
- Its transforming counterpart should be `TreeNode::transform()`. But
`TreeNode::transform()` is an alias for `TreeNode::transform_up()` so it does a
bottom-up (post-order) traversal.
- `TreeNode::visit()`:
Requires a `TreeNodeVisitor` that incorporates 2 methods to do a combined
traversal. `TreeNodeVisitor::f_down()` is called in top-down order,
`TreeNodeVisitor::f_up()` is called in bottom-up order on the nodes in one
combined traversal.
- As there is no `apply_up()` exists, `visit()` can be used to do a
bottom-up inspecting traversal (`TreeNodeVisitor::f_down()` can remain its
default implementation, that does nothing).
- Its transforming counterpart is `TreeNode::rewrite()`.
There are 7 transforming `TreeNode` APIs:
- `TreeNode::transform()`:
An alias for `TreeNode::transform_up()` so it does bottom-up traversal on
the nodes. Consumes them with the provided `f` closure and produces a possibly
transformed node.
- Should be the transforming counterpart of `apply()` but this runs
bottom-up for some reason...
- IMO its docs shouldn't mention traversal direction and probably should
be used for order agnostic transformation of nodes.
- `TreeNode::transform_down()` and `TreeNode::transform_down_mut()`:
These 2 are top-down traverse the nodes, consume them with the provided
`f` closure and produce a possibly transformed node.
- I have no idea why we have the `..._mut()` version, most likely for just
convenience, but IMO one `transform_down()` with `FnMut` should be fair enough.
- `TreeNode::transform_up()` and `TreeNode::transform_up_mut()`:
These 2 are the bottom-up versions of the above.
- I have no idea why we have the `..._mut()` version.
- Don't have an inspecting counterpart
- `TreeNode::transform_down_up()`:
This API accepts 2 closures `f_down` and `f_up`. `f_down` is called in
top-down order and `f_up` is called in bottom-up order on the nodes in one
combined traversal.
- Doesn't have an inspecting counterpart
- `TreeNode::rewrite()`:
Requires a `TreeNodeRewriter` that incorporates 2 methods to do a combined
traversal. `TreeNodeRewriter::f_down()` is called in top-down order,
`TreeNodeRewriter::f_up()` is called in bottom-up order on the nodes in one
combined traversal.
- This API is the transforming counterpart of `TreeNode::visit()`
- If there is no mutable shared data between `TreeNodeRewriter::f_down()`
and `TreeNodeRewriter::f_up()` then it is easier to use
`TreeNode::transform_down_up()` that accepts `f_down` and `f_up` closures
directly and no need to define a `TreeNodeRewriter` instance.
Additional `TreeNode` APIs:
- `TreeNode::apply_children()` and `TreeNode::map_children()`:
Building blocks for the above 2 + 7 `TreeNode` APIs. `apply_children()`
inspects, `map_children()` transforms the children of a `TreeNode` node with
the provided `f` closure.
- `TreeNode` implementations (e.g. `LogicalPlan`, `Expr`, `Arc<dyn
ExecutionPlan>`, `Arc<dyn PhysicalExpr>`, ...) have to implement
`apply_children()` for inspecting and `map_children()` for transforming APIs.
- These 2 methods are public, but they should almost never be called
explicitely by the `TreeNode` API users. If, for some reason, the above 2 + 7
APIs don't cover the recursion needed then these 2 methods can be used to
define a custom algorithms.
- The `apply_children()` name is aligned with `apply()`, but have no idea
why we call the other `map_children()`, `transform_children()` would probably
be a better fit...
Besides the above general `TreeNode` APIs we have some `LogicalPlan` APIs:
- `LogicalPlan::..._with_subqueries()`:
All the above 2 + 7 `TreeNode` APIs have a
`LogicalPlan::..._with_subqueries()` version that do the same as the original
`TreeNode` API but also recurse into subqueries (same type inner `LogicalPlan`
trees that are defined in the expressions of the original `LogicalPlan` tree
nodes).
- We could generalize this concept into `TreeNode` but `LogicalPlan` seems
to be only `TreeNode` type that makes use of these APIs.
Additional `LogicalPlan` APIs:
- `LogicalPlan::apply_expressions()`, `LogicalPlan::map_expressions()`,
`LogicalPlan::apply_subqueries()` and `LogicalPlan::map_subqueries()`:
Building blocks for `LogicalPlan::..._with_subqueries()` APIs.
`apply_expressions()` and `apply_subqueries()` inspect, `map_expressions()` and
`map_subqueries()` transform the expressions/subqueries of a `LogicalPlan` node
with the provided `f` closure.
- These 4 are similar to `TreeNode::apply_children()` and
`TreeNode::map_children()` and so they are rarely called directly but can be
used to define a custom algorithms.
Here is the summary of the current situation:
`TreeNode` APIs:
| traversal order | inspecting | transforming |
| --- | --- | --- |
| order agnostic | | `transform()` |
| top-down | `apply()` | `transform_down()`, `transform_down_mut()`|
| bottom-up | | `transform_up()`, `transform_up_mut()` |
| combined with separete `f_down` and `f_up` closures | |
`transform_down_up()` |
| combined with incorporated `f_down()` and `f_up()` methods into an object
| `visit()` + `TreeNodeVisitor` | `rewrite()` + `TreeNodeRewriter` |
Additional `TreeNode` APIs: `apply_children()`, `map_children()`.
`LogicalPlan` APIs:
| traversal order | inspecting | transforming |
| --- | --- | --- |
| order agnostic | | `transform_with_subqueries()` |
| top-down | `apply_with_subqueries()` | `transform_down_with_subqueries()`,
`transform_down_mut_with_subqueries()`|
| bottom-up | | `transform_up_with_subqueries()`,
`transform_up_mut_with_subqueries()` |
| combined with separete `f_down` and `f_up` closures | |
`transform_down_up_with_subqueries()` |
| combined with incorporated `f_down()` and `f_up()` methods into an object
| `visit_with_subqueries()` + `TreeNodeVisitor` | `rewrite_with_subqueries()` +
`TreeNodeRewriter` |
Additional `LogicalPlan` APIs: `apply_expressions()`, `map_expressions()`,
`apply_subqueries()`, `map_subqueries()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]