Re: [I] Get rid of special TreeNodes [arrow-datafusion]

via GitHub Thu, 28 Dec 2023 06:18:53 -0800


peter-toth commented on issue #8663:
URL: 
https://github.com/apache/arrow-datafusion/issues/8663#issuecomment-1871218567


   I think think this is partly the same idea I was suggesting in 1. of 
https://github.com/apache/arrow-datafusion/pull/7942. The main point there was 
that the closures change from `FnMut(Self) -> Result<...>` to the self mutating 
`FnMut(&mut Self) -> Result<...>`.
   (Please note that the `TreeNodeTransformer` trait is not important in my PR. 
I just added there it to make `transform()` similar to the existing 
`rewrite()`. `rewrite()` uses a `TreeNodeRewriter` that has a pre-order 
(`pre_visit()`) and post-order like (`mutate()`) methods collected into one 
object, but I too would prefer separate `f_down` and `f_up` closures. Actually 
the new `transform_with_payload()` in 
https://github.com/apache/arrow-datafusion/pull/8664 follows that idea.)
   
   Where I see some differences is the return type of our closures. Your code 
snipett shows `Result<()>` while I was suggesting `Result<TreeNodeRecursion>`. 
This is because I wanted to deprecate the current `rewrite()` / 
`TreeNodeRewriter` and replace/combine that with a new `transform()` method. If 
we want to do that then the new closures need to be able to prune the tree 
traversal (skip a subtree) and I don't think we can do it with a simple 
`Result<()>`.
   (Actaully as the same pruning capability is needed for the 
`TreeNode.apply()` / `TreeNode.visit()` as well so I poposed to have a common 
`TreeNodeRecursion` enum for visit and transform functions. This is 2. in 
https://github.com/apache/arrow-datafusion/pull/7942)
   
   So basically here is the `transform()` I suggested:
   ```rust
   trait TreeNode_v2: Sized {
       fn transform_children<F>(&mut self, f: &mut F) -> 
Result<TreeNodeRecursion>
       where
           F: FnMut(&mut Self) -> Result<TreeNodeRecursion>;
   
       fn transform(&mut self, f_down: &mut F,f_up: &mut F) -> 
Result<TreeNodeRecursion>
       where 
           F: FnMut(&mut self) -> Result<TreeNodeRecursion>,
        {
           // Apply `f_down` on self.
           f_down(self)?
               // If it returns continue (not prune or stop or stop all) then 
continue
               // traversal on inner children and children.
               .and_then_on_continue(||
                   // Run the recursive `transform` on each children.
                   self
                       .transform_children(&mut |c| c.transform(f_down, f_up))?
                       // Apply `f_up` on new self.
                       .and_then_on_continue(|| f_up(self)))?
               // Applying `pre_transform` or `post_transform` on self might 
have returned
               // prune, but we need to propagate continue.
               .continue_on_prune()
       }
   }
   ```
   
   Regarding 
   
   > the key concept is that the user only must implement how the node reaches 
its children
   
   I totally aggree.
   But my question about `fn children<'a>(&mut self) -> &'a mut [Self];` is 
that can we return children as `&mut[Self]` without collecting them into a 
temporary collection like `Vec`? Currently `Expr.map_children()` doesn't 
collect the children into a temporary collection so if we start doing it then 
we might introduce some performance regression.
   BTW I was trying to follow a very similar approach initially. My initial 
`children()` returned an `impl Iterator<Item=&mut Self>` so as to avoid 
collecting childrens. But I simply couldn't implement it for `Expr` without 
returning an dynamic dispatch iterator so I remained at the above 
`transform_children()` that is similar to the current `map_children()`.
   
   Lastly, in this particulat issue I wanted to see if there is a way to get 
rid of those special trees. It seems unecessary to create those thees and I 
found it very hard to follow how additonal payload is propagated down/up. In 
https://github.com/apache/arrow-datafusion/pull/8664 I removed `SortPushDown` 
and `PlanWithRequitements` and `ExprOrdering` with the help of 
`transform_down_with_payload()` and `transform_up_with_payload()` functions and 
I think the new `f_down` / `f_up` closures are much simpler to understand now.
   But, I'm open for other proposals like @alamb's 
https://github.com/apache/arrow-datafusion/issues/8663#issuecomment-1870402672 
although I'm not sure yet if capturing the state in the closures make things 
easier.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Get rid of special TreeNodes [arrow-datafusion]

Reply via email to