Re: [PR] Cleanup TreeNode implementations [arrow-datafusion]

via GitHub Fri, 29 Dec 2023 10:26:54 -0800


peter-toth commented on code in PR #8672:
URL: https://github.com/apache/arrow-datafusion/pull/8672#discussion_r1438363204



##########
datafusion/expr/src/tree_node/expr.rs:
##########
@@ -24,16 +24,13 @@ use crate::expr::{
 };
 use crate::{Expr, GetFieldAccess};
 
-use datafusion_common::tree_node::{TreeNode, VisitRecursion};
+use datafusion_common::tree_node::TreeNode;
 use datafusion_common::{internal_err, DataFusionError, Result};
 
 impl TreeNode for Expr {
-    fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion>
-    where
-        F: FnMut(&Self) -> Result<VisitRecursion>,
-    {
-        let children = match self {
-            Expr::Alias(Alias{expr,..})
+    fn children_nodes(&self) -> Vec<Self> {

Review Comment:
   This PR is a nice refactor but I'm not sure it is needed at all (1.) and I'm 
also not sure cloning a node's children into a `Vec` is good direction (2.). 
Let me share my thoughts on these 2.
   
   1. I think in DataFusion there are only 4 real tree types: `LogicalPlan`, 
`Expr`, `ExecutionPlan` and `PhysicalExpr` and there 7 special tree types based 
on the above 4: `SortPushDown`, `PlanWithRequitements`, `ExprOrdering`, 
`OrderPreservationContext`, `PipelineStatePropagator`, 
`PlanWithCorrespondingCoalescePartitions` and `PlanWithCorrespondingSort`.
   The only reason why DataFusion currently has these 7 special trees is to be 
able to propagate down/up some additional information during tree 
transformations. But I think these special trees are simply not needed and can 
be removed. If we add some transformation helper functions like 
`TreeNode.transform_down_with_payload()` and 
`TreeNode.transform_up_with_payload()` we can easily remove these 7 special 
types completely.
     https://github.com/apache/arrow-datafusion/issues/8663 is about this issue 
and I removed `SortPushDown`, `PlanWithRequitements` and `ExprOrdering` as 
exaples in this PR: https://github.com/apache/arrow-datafusion/pull/8664. 
   
   2. If we just focus on the 4 base tree types we can distingsuish 2 different 
types: `Expr` uses `Box`es but the other 3 (`LogicalPlan`, `ExecutionPlan` and 
`PhysicalExpr`) use `Arc`s for the links between the nodes. I think this is 
very important in their behaviour. E.g. a big issue for `Expr`s when the nodes 
are cloned as it basicaly  means cloning the whole subtree under the node. But 
cloning is not an issue for the other 3 as cloning `Arc`s is cheap. So I think 
the problem with the current `Expr.apply_children()` and the 
`Expr.children_nodes()` in this PR that it doesn't move the code into the a 
direction where cloning is not necessary on trees made of `Box`es.
     What I would suggest long term is in 
https://github.com/apache/arrow-datafusion/pull/7942. Instead of the old 
`TreeNode.apply_children()` (or collecting the children into a `Vec`) we could 
use `TreeNode.visit_children()` 
(https://github.com/apache/arrow-datafusion/pull/7942/files#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R31).
 IMO we need to implement `visit_children()` for the 4 base tree types only 
(see 1.).
   (Unfortunately I didn't noted the visit related changes in the 
https://github.com/apache/arrow-datafusion/pull/7942 PR description, but it is 
a big PR with many different ideas and focusing on transform seemed more 
important.) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Cleanup TreeNode implementations [arrow-datafusion]

Reply via email to