blaginin commented on issue #9373: URL: https://github.com/apache/datafusion/issues/9373#issuecomment-2424113550
I think `ConcreteTreeNode` and `DynTreeNode` are the best places to write iterative algorithms. They already have methods to get and replace children, so we can flatten and update them iteratively. Re-implementing non-recursive `visit`/`rewrite`/`transform` methods for `ConcreteTreeNode` and `DynTreeNode` feels like a good first step to me. Once this is done, we will need to implement either of those two traits for `LogicalPlan`: Because `LogicalPlan` stores its children in `Arc`s, it feels natural to implement support for `DynTreeNode`. But I think this approach will be slow. Right now, when we do `map_children`, we [destruct](https://github.com/apache/datafusion/blob/7a3414774cb7858d9649820ddffa59f5712a3153/datafusion/expr/src/logical_plan/tree_node.rs#L78) `self` before invoking the given function on the `Arc::unwrap_or_clone(Arc<LogicalPlan>)`. This means that most likely, we _just unwrap_ the child logical plan _without copy_ because no one else is probably referencing it. However, if we implement support for the `DynTreeNode` approach, we won't be able to pull off the same trick with zero copy. The node will still be referenced by its parent, so `unwrap_or_clone` will become costly. Hence, I think `ConcreteTreeNode` will probably be a better candidate. The transition will be a bit tricky, because currently `LogicalPlan` stores only shared references to the objects. But I think we can update this with minimal API changes. I see two options for that: 1. To me personally, it feels natural for `LogicalPlan` to _own_ its children, for example by storing them in `Box` rather than in `Arc`. Even now, if the child plan is referenced multiple times, we'll copy it on rewrite anyway, so I don't think this proposal will increase the clone rate. But on the negative side, we'll have to change `LogicalPlan`, `Subquery`, `Expr` inner types and some methods... 2. Alternatively, we can switch to capturing state in closures, for example: ```rust pub enum MaybeBarren<T> { // node is present Present(T), // node has been destructed, but can be re-constructed when we get its children Barren(Box<dyn FnOnce(Vec<T>) -> Result<T>>), } ``` There will still be some struct and methods changes, but probably less than in the option one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
