peter-toth commented on issue #8663: URL: https://github.com/apache/arrow-datafusion/issues/8663#issuecomment-1871218567
I think think this is partly the same idea I was suggesting in 1. of https://github.com/apache/arrow-datafusion/pull/7942. The main point there was that the closures change from `FnMut(Self) -> Result<...>` to the self mutating `FnMut(&mut Self) -> Result<...>`. (Please note that the `TreeNodeTransformer` trait is not important in my PR. I just added there it to make `transform()` similar to the existing `rewrite()`. `rewrite()` uses a `TreeNodeRewriter` that has a pre-order (`pre_visit()`) and post-order like (`mutate()`) methods collected into one object, but I too would prefer separate `f_down` and `f_up` closures. Actually the new `transform_with_payload()` in https://github.com/apache/arrow-datafusion/pull/8664 follows that idea.) Where I see some differences is the return type of our closures. Your code snipett shows `Result<()>` while I was suggesting `Result<TreeNodeRecursion>`. This is because I wanted to deprecate the current `rewrite()` / `TreeNodeRewriter` and replace/combine that with a new `transform()` method. If we want to do that then the new closures need to be able to prune the tree traversal (skip a subtree) and I don't think we can do it with a simple `Result<()>`. (Actaully as the same pruning capability is needed for the `TreeNode.apply()` / `TreeNode.visit()` as well so I poposed to have a common `TreeNodeRecursion` enum for visit and transform functions. This is 2. in https://github.com/apache/arrow-datafusion/pull/7942) So basically here is the `transform()` I suggested: ```rust trait TreeNode_v2: Sized { fn transform_children<F>(&mut self, f: &mut F) -> Result<TreeNodeRecursion> where F: FnMut(&mut Self) -> Result<TreeNodeRecursion>; fn transform(&mut self, f_down: &mut F,f_up: &mut F) -> Result<TreeNodeRecursion> where F: FnMut(&mut self) -> Result<TreeNodeRecursion>, { // Apply `f_down` on self. f_down(self)? // If it returns continue (not prune or stop or stop all) then continue // traversal on inner children and children. .and_then_on_continue(|| // Run the recursive `transform` on each children. self .transform_children(&mut |c| c.transform(f_down, f_up))? // Apply `f_up` on new self. .and_then_on_continue(|| f_up(self)))? // Applying `pre_transform` or `post_transform` on self might have returned // prune, but we need to propagate continue. .continue_on_prune() } } ``` Regarding > the key concept is that the user only must implement how the node reaches its children I totally aggree. But my question about `fn children<'a>(&mut self) -> &'a mut [Self];` is that can we return children as `&mut[Self]` without collecting them into a temporary collection like `Vec`? Currently `Expr.map_children()` doesn't collect the children into a temporary collection so if we start doing it then we might introduce some performance regression. BTW I was trying to follow a very similar approach initially. My initial `children()` returned an `impl Iterator<Item=&mut Self>` so as to avoid collecting childrens. But I simply couldn't implement it for `Expr` without returning an dynamic dispatch iterator so I remained at the above `transform_children()` that is similar to the current `map_children()`. Lastly, in this particulat issue I wanted to see if there is a way to get rid of those special trees. It seems unecessary to create those thees and I found it very hard to follow how additonal payload is propagated down/up. In https://github.com/apache/arrow-datafusion/pull/8664 I removed `SortPushDown` and `PlanWithRequitements` and `ExprOrdering` with the help of `transform_down_with_payload()` and `transform_up_with_payload()` functions and I think the new `f_down` / `f_up` closures are much simpler to understand now. But, I'm open for other proposals like @alamb's https://github.com/apache/arrow-datafusion/issues/8663#issuecomment-1870402672 although I'm not sure yet if capturing the state in the closures make things easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
