peter-toth commented on PR #9780: URL: https://github.com/apache/arrow-datafusion/pull/9780#issuecomment-2031496122
I wonder if it would make sense to split this PR into 2 parts as it seem to mix 2 very different ideas. - It introduces an in place mutation `TreeNode` API. First of all, I really like that API and I too think it can be faster than the current consume and procude style ones (`TreeNode::rewrite()`, `TreeNode::transform_down_up()`). But I think the only reason why it is faster is because only smaller references (e.g. `&mut LogicalPlan`s) are stored on the stack during a transformation and not the original structures themselves (`LogicalPlan`s). Otherwise the 2 APIs are very similar, there should be nothing that you can do with a `&mut LogicalPlan`s but you can't do with an owned `LogicalPlan`. - It uses a trick in `rewrite_arc` to avoid copying `Arc`s. But this trick shouldn't require introducing the new API, it should simply work with changing the existing `LogicalPlan::map_children()`. So I don't think we need a new API for https://github.com/apache/arrow-datafusion/pull/9768. Don't get me wrong, I favour the new API but I would rather move that to a completely separate PR and slowly transform the code to using that in the future. As for the new `LogicalPlan` methods (`rewrite_inputs()`, `rewrite_subqueries()`, `rewrite_exprs()`, ...) in this PR, the problem is that they use the new `Transformed::and_then()` method to iterate over sibling nodes in the tree (e.g. https://github.com/apache/arrow-datafusion/pull/9780/files#diff-b2431d53b487c80c797aecd88859229fb902f74bf888a598878386055b717bbcR188-R189 or https://github.com/apache/arrow-datafusion/pull/9780/files#diff-b2431d53b487c80c797aecd88859229fb902f74bf888a598878386055b717bbcR229-R230). But that's not correct as the new `and_then()` doesn't respect `TreeNodeRecursion::Stop` behaviour. We already have `TransformedIterator::map_until_stop_and_collect()` and `TreeNode::try_transform_node()` to chain transformations in such cases (see examples in `Expr::map_children()`). BTW, I'm trying to simplyfy those `try_transform_node` chained calls in https://github.com/apache/arrow-datafusion/pull/9876. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
