peter-toth commented on code in PR #10035:
URL: 
https://github.com/apache/arrow-datafusion/pull/10035#discussion_r1562648720


##########
datafusion/common/src/tree_node.rs:
##########
@@ -43,17 +42,32 @@ macro_rules! handle_transform_recursion_up {
     }};
 }
 
-/// Defines a visitable and rewriteable tree node. This trait is implemented
-/// for plans ([`ExecutionPlan`] and [`LogicalPlan`]) as well as expression
-/// trees ([`PhysicalExpr`], [`Expr`]) in DataFusion.
+/// API for visiting (read only) and rewriting tree data structures .
+///
+/// Note: this trait is implemented for plans ([`ExecutionPlan`] and
+/// [`LogicalPlan`]) as well as expression trees ([`PhysicalExpr`], [`Expr`]).
+///
+/// # Common APIs:
+/// * [`Self::apply`] and [`Self::visit`] for inspecting (without 
modification) borrowed nodes
+/// * [`Self::transform`] and [`Self::rewrite`] for rewriting owned nodes
+///
+/// # Terminology
+/// The following terms are used in this trait
+///
+/// * `f_down`: Invoked before any children of the current node are visited.
+/// * `f_up`: Invoked after all children of the current node are visited.
+/// * `f`: closure that is applied to the current node.
+/// * `map_*`: applies a transformation to rewrite owned nodes
+/// * `apply_*`:  invokes a function on borrowed nodes
+/// * `transform_`: applies a transformation to rewrite owned nodes

Review Comment:
   Previous efforts (especially 
https://github.com/apache/arrow-datafusion/pull/8891 but others too: 
https://github.com/apache/arrow-datafusion/pull/9876, 
https://github.com/apache/arrow-datafusion/pull/9913, 
https://github.com/apache/arrow-datafusion/pull/9965, 
https://github.com/apache/arrow-datafusion/pull/9999) consolidated the 
behaviour of existing APIs (consistent `TreeNodeRecursion` behaviour and common 
`Transformed` return types of transforming APIs) but didn't try to normalize 
naming of the APIs in favour of causing the least incompatibility with previous 
DataFusion versions. So this means that the current API names evolved 
organically and thus they might not be intuitive in all cases and are a bit 
inconsistent here and there...
   
   Basically, there are 2 kinds of `TreeNode` APIs:
   1. "Inspecting" APIs to taverse over a tree and inspect `&TreeNode`s. 
   2. "Transforming" APIs to traverse over a tree while consuming `TreeNode`s 
and produce possibly changed `TreeNode`s
     (i.e. there is no in-place mutation API, that uses `&mut TreeNode`, but we 
are improving the consume and produce APIs to cleate less new objects 
unnecessarly e.g.: https://github.com/apache/arrow-datafusion/pull/9999)
   
   There are only 2 inspecting `TreeNode` APIs:
   - `TreeNode::apply()`:
     Top-down (pre-order) traverses the nodes and applies the povided `f` 
closure on the nodes.
     - As `TreeNode` trees are hierarchical collections of nodes, something 
like `foreach()` would better describe the operation.
     - Because the API name doesn't explicitely defines the traversal direction 
IMO its docs shouldn't say anything about its implementation just the fact that 
it should be used for order agnostic inspection of nodes.
     - Its transforming counterpart should be `TreeNode::transform()`. But 
`TreeNode::transform()` is an alias for `TreeNode::transform_up()` so it does a 
bottom-up (post-order) traversal.
   - `TreeNode::visit()`:
     Requires a `TreeNodeVisitor` that incorporates 2 methods to do a combined 
traversal. `TreeNodeVisitor::f_down()` is called in top-down order, 
`TreeNodeVisitor::f_up()` is called in bottom-up order on the nodes in one 
combined traversal.
     - As there is no `apply_up()` exists, `visit()` can be used to do a 
bottom-up inspecting traversal (`TreeNodeVisitor::f_down()` can remain its 
default implementation, that does nothing).
     - Its transforming counterpart is `TreeNode::rewrite()`.
   
   There are 7 transforming `TreeNode` APIs:
   - `TreeNode::transform()`:
     An alias for `TreeNode::transform_up()` so it does bottom-up traversal on 
the nodes. Consumes them with the provided `f` closure and produces a possibly 
transformed node.
     - Should be the transforming counterpart of `apply()` but this runs 
bottom-up for some reason...
     - IMO its docs shouldn't mention traversal direction and probably should 
be used for order agnostic transformation of nodes.
   - `TreeNode::transform_down()` and `TreeNode::transform_down_mut()`:
     These 2 are top-down traverse the nodes, consume them with the provided 
`f` closure and produce a possibly transformed node.
     - I have no idea why we have the `..._mut()` version, most likely for just 
convenience, but IMO one `transform_down()` with `FnMut` should be fair enough.
   - `TreeNode::transform_up()` and `TreeNode::transform_up_mut()`:
     These 2 are the bottom-up versions of the above.
     - I have no idea why we have the `..._mut()` version.
     - Don't have an inspecting counterpart
   - `TreeNode::transform_down_up()`:
     This API accepts 2 closures `f_down` and `f_up`. `f_down` is called in 
top-down order and `f_up` is called in bottom-up order on the nodes in one 
combined traversal.
     - Doesn't have an inspecting counterpart
   - `TreeNode::rewrite()`:
     Requires a `TreeNodeRewriter` that incorporates 2 methods to do a combined 
traversal. `TreeNodeRewriter::f_down()` is called in top-down order, 
`TreeNodeRewriter::f_up()` is called in bottom-up order on the nodes in one 
combined traversal.
     - This API is the transforming counterpart of `TreeNode::visit()`
     - If there is no mutable shared data between `TreeNodeRewriter::f_down()` 
and `TreeNodeRewriter::f_up()` then it is easier to use 
`TreeNode::transform_down_up()` that accepts `f_down` and `f_up` closures 
directly and no need to define a `TreeNodeRewriter` instance.
    
   Additional `TreeNode` APIs:
   - `TreeNode::apply_children()` and `TreeNode::map_children()`:
     In order the make the above general `TreeNode` APIs work the `TreeNode` 
implementations (e.g. `LogicalPlan`, `Expr`, `Arc<dyn ExecutionPlan>`, `Arc<dyn 
PhysicalExpr>`, ...) have to implement `TreeNode::apply_children()` for 
inspecting and `TreeNode::map_children()` for transforming APIs.
     - `apply_children()` inspects, `map_children()` transforms the children of 
a `TreeNode` node with the provided `f` closure.
     - These 2 methods are public, but they should almost never be called 
explicitely by the `TreeNode` API users. If, for some reason, the above 2 + 7 
APIs don't cover the recursion needed then these 2 methods can be used to 
define a custom algorithms.
     - The `apply_children()` name is aligned with `apply()`, but have no idea 
why we call the other `map_children()`, `transform_children()` would probably 
be a better fit...
   
   Besides the above general `TreeNode` APIs we have some `LogicalPlan` APIs:
   - `LogicalPlan::..._with_subqueries()`:
     All the above 2 + 7 `TreeNode` APIs have a 
`LogicalPlan::..._with_subqueries()` version that do the same as the original 
`TreeNode` API but also recurse into subqueries (same type inner `LogicalPlan` 
trees that are defined in the expressions of the original `LogicalPlan` tree 
nodes).
     - We could generalize this concept into `TreeNode` but `LogicalPlan` seems 
to be only `TreeNode` type that makes use of these APIs.
   
   Additional `LogicalPlan` APIs:
   - `LogicalPlan::apply_expressions()`, `LogicalPlan::map_expressions()`, 
`LogicalPlan::apply_subqueries()` and `LogicalPlan::map_subqueries()`:
     In order to make the above `LogicalPlan::..._with_subqueries()` APIs work 
`LogicalPlan` implements them as building blocks.
     - `apply_expressions()` and `apply_subqueries()` inspect, 
`map_expressions()` and `map_subqueries()` transform the expressions/subqueries 
of a `LogicalPlan` node with the provided `f` closure.
     - These 4 are similar to `TreeNode::apply_children()` and 
`TreeNode::map_children()` and so they are rarely called directly but can be 
used to define a custom algorithms.
   
   Here is the summary of the current situation:
   
   `TreeNode` APIs:
   | traversal order | inspecting | transforming |
   | --- | --- | --- |
   | order agnostic  | | `transform()` |
   | top-down | `apply()` | `transform_down()`, `transform_down_mut()`|
   | bottom-up | | `transform_up()`, `transform_up_mut()` |
   | combined with separete `f_down` and `f_up` closures | | 
`transform_down_up()` |
   | combined with incorporated `f_down()` and `f_up()` methods into an object 
| `visit()` + `TreeNodeVisitor` | `rewrite()` + `TreeNodeRewriter` |
   
   Additional `TreeNode` APIs: `apply_children()`, `map_children()`.
   
   `LogicalPlan` APIs:
   | traversal order | inspecting | transforming |
   | --- | --- | --- |
   | order agnostic  | | `transform_with_subqueries()` |
   | top-down | `apply_with_subqueries()` | `transform_down_with_subqueries()`, 
`transform_down_mut_with_subqueries()`|
   | bottom-up | | `transform_up_with_subqueries()`, 
`transform_up_mut_with_subqueries()` |
   | combined with separete `f_down` and `f_up` closures | | 
`transform_down_up_with_subqueries()` |
   | combined with incorporated `f_down()` and `f_up()` methods into an object 
| `visit_with_subqueries()` + `TreeNodeVisitor` | `rewrite_with_subqueries()` + 
`TreeNodeRewriter` |
   
   Additional `LogicalPlan` APIs: `apply_expressions()`, `map_expressions()`, 
`apply_subqueries()`, `map_subqueries()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to