peter-toth commented on code in PR #10035:
URL:
https://github.com/apache/arrow-datafusion/pull/10035#discussion_r1562648720
##########
datafusion/common/src/tree_node.rs:
##########
@@ -43,17 +42,32 @@ macro_rules! handle_transform_recursion_up {
}};
}
-/// Defines a visitable and rewriteable tree node. This trait is implemented
-/// for plans ([`ExecutionPlan`] and [`LogicalPlan`]) as well as expression
-/// trees ([`PhysicalExpr`], [`Expr`]) in DataFusion.
+/// API for visiting (read only) and rewriting tree data structures .
+///
+/// Note: this trait is implemented for plans ([`ExecutionPlan`] and
+/// [`LogicalPlan`]) as well as expression trees ([`PhysicalExpr`], [`Expr`]).
+///
+/// # Common APIs:
+/// * [`Self::apply`] and [`Self::visit`] for inspecting (without
modification) borrowed nodes
+/// * [`Self::transform`] and [`Self::rewrite`] for rewriting owned nodes
+///
+/// # Terminology
+/// The following terms are used in this trait
+///
+/// * `f_down`: Invoked before any children of the current node are visited.
+/// * `f_up`: Invoked after all children of the current node are visited.
+/// * `f`: closure that is applied to the current node.
+/// * `map_*`: applies a transformation to rewrite owned nodes
+/// * `apply_*`: invokes a function on borrowed nodes
+/// * `transform_`: applies a transformation to rewrite owned nodes
Review Comment:
So, let me add a few comments to the above table.
Previous efforts (especially
https://github.com/apache/arrow-datafusion/pull/8891 but others too:
https://github.com/apache/arrow-datafusion/pull/9876,
https://github.com/apache/arrow-datafusion/pull/9913,
https://github.com/apache/arrow-datafusion/pull/9965,
https://github.com/apache/arrow-datafusion/pull/9999) consolidated the
behaviour of existing APIs (consistent `TreeNodeRecursion` behaviour and common
`Transformed` return types of transforming APIs) but didn't try to normalize
naming of the APIs in favour of causing the least incompatibility with previous
DataFusion versions. So this means that the current API names evolved
organically and thus they might not be intuitive in all cases and are a bit
inconsistent here and there...
Basically, there are 2 kinds of `TreeNode` APIs:
1. Inspecting APIs to taverse over a tree and inspect `&TreeNode`s.
2. Transforming APIs to traverse over a tree while consuming `TreeNode`s and
produce possibly changed `TreeNode`s
(i.e. there is no in-place mutation API, that uses `&mut TreeNode`, but we
are improving the consume and produce APIs to cleate less new objects
unnecessarly e.g.: https://github.com/apache/arrow-datafusion/pull/9999)
There are only 2 inspecting APIs:
- `TreeNode::apply()`:
Top-down (pre-order) traverses the nodes and applies the povided `f`
closure on the nodes.
- As `TreeNode` trees are hierarchical collections of nodes, something
like `foreach()` would better describe the operation.
- Because the API name doesn't explicitely defines the traversal direction
IMO its docs shouldn't say anything about its implementation just the fact that
it should be used for order agnostic inspection of nodes.
- Its transforming counterpart is `TreeNode::transform()`. But
`TreeNode::transform()` is an alias for `TreeNode::transform_up()` so it does a
bottom-up traversal...
- `TreeNode::visit()`:
Requires a `TreeNodeVisitor` that incorporates 2 methods to do a combined
traversal. `TreeNodeVisitor::f_down()` is called in top-down order,
`TreeNodeVisitor::f_up()` is called in bottom-up order on the nodes in one
combined traversal.
- As there is no `apply_up()` exists, `visit()` can be used to do a
bottom-up (pre-order) inspecting traversal.
- Its transforming counterpart is `TreeNode::rewrite()`.
There are 7 transforming APIs:
- `TreeNode::transform()`:
An alias for `TreeNode::transform_up()` so it does bottom-up (post-order)
traversal on the nodes. Consumes them with the provided `f` closure and
produces a possibly transformed node.
- Should be the transforming counterpart of `apply()` but this runs
bottom-up for some reason...
- IMO its docs shouldn't mention traversal direction and probably should
be used for order agnostic transformation of nodes.
- `TreeNode::transform_down()` and `TreeNode::transform_down_mut()`:
These 2 are top-down traverse the nodes, consume them with the provided
`f` closure and produce a possibly transformed node.
- I have no idea why we have the `..._mut()` version, most likely for just
convenience, but IMO one `transform_down()` with `FnMut` should be fair enough.
- `TreeNode::transform_up()` and `TreeNode::transform_up_mut()`:
These 2 are the bottom-up versions of the above.
- I have no idea why we have the `..._mut()` version.
- Don't have an inspecting counterpart
- `TreeNode::transform_down_up()`:
This API accepts 2 closures `f_down` and `f_up`. `f_down` is called in
top-down order and `f_up` is called in bottom-up order on the nodes in one
combined traversal.
- Doesn't have an inspecting counterpart
- `TreeNode::rewrite()`:
Requires a `TreeNodeRewriter` that incorporates 2 methods to do a combined
traversal. `TreeNodeRewriter::f_down()` is called in top-down order,
`TreeNodeRewriter::f_up()` is called in bottom-up order on the nodes in one
combined traversal.
- This API is the transforming counterpart of `TreeNode::visit()`
- If there is no mutable shared data between `TreeNodeRewriter::f_down()`
and `TreeNodeRewriter::f_up()` then it is easier to use
`TreeNode::transform_down_up()` that accepts `f_down` and `f_up` closures
directly and no need to define a `TreeNodeRewriter` instance.
In order the make the above general `TreeNode` APIs work the `TreeNode`
implementations (e.g. `LogicalPlan`, `Expr`, `Arc<dyn ExecutionPlan>`, `Arc<dyn
PhysicalExpr>`, ...) has to implement `TreeNode::apply_children()` for
inspecting and `TreeNode::map_children()` for transforming APIs.
These 2 methods are public, but they should almost never be called
explicitely by the `TreeNode` API users. If, for some reason, the above 2
inspecting + 7 transforming API doesn't cover the recursion needed then these 2
methods can be used to define a custom algorithms.
- I get the `apply_children()` name, but have no idea why we call the other
`map_children()`, `transform_children()` would probably be a better fit...
Besides the above general `TreeNode` APIs we have some additional
`LogicalPlan` APIs.
All the above 2 + 7 APIs have a `LogicalPlan::..._with_subqueries()` version
that do the same as the original `TreeNode` API but also recurse into
subqueries (same type inner `LogicalPlan` trees that are defined in the
expressions of the original `LogicalPlan` tree).
- We could generalize this concept into `TreeNode` but `LogicalPlan` seems
to be only `TreeNode` type that makes use of these APIs.
In order to make the above `LogicalPlan::..._with_subqueries()` APIs work
`LogicalPlan` implements `LogicalPlan::apply_expressions()`,
`LogicalPlan::map_expressions()`, `LogicalPlan::apply_subqueries()` and
`LogicalPlan::map_subqueries()`.
- These 4 are similar to `TreeNode::apply_children()` and
`TreeNode::map_children()` and rarely called directly.
Here is a summary of the current situation:
`TreeNode` APIs summary:
| traversal order | inspecting | transforming |
| --- | --- | --- |
| order agnostic | | `transform()` |
| top-down | `apply()` | `transform_down()`, `transform_down_mut()`|
| bottom-up | | `transform_up()`, `transform_up_mut()` |
| combined with separete `f_down` and `f_up` closures | |
`transform_down_up()` |
| combined with incorporated `f_down()` and `f_up()` methods into an object
| `visit()` + `TreeNodeVisitor` | `rewrite()` + `TreeNodeRewriter` |
Additional `TreeNode` APIs for custom recursion: `apply_children()`,
`map_children()`.
Additional `LogicalPlan` APIs summary:
| traversal order | inspecting | transforming |
| --- | --- | --- |
| order agnostic | | `transform_with_subqueries()` |
| top-down | `apply_with_subqueries()` | `transform_down_with_subqueries()`,
`transform_down_mut_with_subqueries()`|
| bottom-up | | `transform_up_with_subqueries()`,
`transform_up_mut_with_subqueries()` |
| combined with separete `f_down` and `f_up` closures | |
`transform_down_up_with_subqueries()` |
| combined with incorporated `f_down()` and `f_up()` methods into an object
| `visit_with_subqueries()` + `TreeNodeVisitor` | `rewrite_with_subqueries()` +
`TreeNodeRewriter` |
Additional `LogicalPlan` APIs for custom recursion: `apply_expressions()`,
`map_expressions()`, `apply_subqueries()`, `map_subqueries()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]