blaginin commented on issue #9373:
URL: https://github.com/apache/datafusion/issues/9373#issuecomment-2424113550

   I think `ConcreteTreeNode` and `DynTreeNode` are the best places to write 
iterative algorithms. They already have methods to get and replace children, so 
we can flatten and update them iteratively. Re-implementing non-recursive 
`visit`/`rewrite`/`transform` methods for `ConcreteTreeNode` and `DynTreeNode` 
feels like a good first step to me.
   
   Once this is done, we will need to implement either of those two traits for 
`LogicalPlan`:
   
   Because `LogicalPlan` stores its children in `Arc`s, it feels natural to 
implement support for `DynTreeNode`. But I think this approach will be slow. 
Right now, when we do `map_children`, we 
[destruct](https://github.com/apache/datafusion/blob/7a3414774cb7858d9649820ddffa59f5712a3153/datafusion/expr/src/logical_plan/tree_node.rs#L78)
 `self` before invoking the given function on the 
`Arc::unwrap_or_clone(Arc<LogicalPlan>)`. This means that most likely, we _just 
unwrap_ the child logical plan _without copy_ because no one else is probably 
referencing it.
   However, if we implement support for the `DynTreeNode` approach, we won't be 
able to pull off the same trick with zero copy. The node will still be 
referenced by its parent, so `unwrap_or_clone` will become costly.
   
   Hence, I think `ConcreteTreeNode` will probably be a better candidate. The 
transition will be a bit tricky, because currently `LogicalPlan` stores only 
shared references to the objects. But I think we can update this with minimal 
API changes.
   
   I see two options for that:
   1. To me personally, it feels natural for `LogicalPlan` to _own_ its 
children, for example by storing them in `Box` rather than in `Arc`. Even now, 
if the child plan is referenced multiple times, we'll copy it on rewrite 
anyway, so I don't think this proposal will increase the clone rate. But on the 
negative side, we'll have to change `LogicalPlan`, `Subquery`, `Expr` inner 
types and some methods...
   
   2. Alternatively, we can switch to capturing state in closures, for example:
   ```rust
   pub enum MaybeBarren<T> {
       // node is present
       Present(T),
       // node has been destructed, but can be re-constructed when we get its 
children
       Barren(Box<dyn FnOnce(Vec<T>) -> Result<T>>),
   }
   ```
   There will still be some struct and methods changes, but probably less than 
in the option one. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to