berkaysynnada commented on issue #8663:
URL: 
https://github.com/apache/arrow-datafusion/issues/8663#issuecomment-1871114495

   We have also observed the complex API and different implementation 
approaches of TreeNode. I'm not sure if this is the right place to share these 
but, let me share our findings and a simple POC of how we can have a new trait 
providing a simpler and more understandable API:
   
   Since the general approach in Datafusion tree structs involves having the 
children nodes within the self node, a pure `transform_down` cannot be 
performed. Even if the nodes are applied by the given `op()` in pre-order, the 
build order is always `bottom-up`. Inserting some logic into that build process 
(into `map_children()`) can introduce hidden `transform-up` functionality. For 
`transform-up()`s, the same flexibility may also cause the spread of a rule's 
logic into different parts of the code, going against code integrity.
   
   There are also some unused functionalities and outcome types in general, so 
I think that a more simple API is enough usually. I scribbled a little and came 
up with something like this, sharing it here in case it's useful.
   
   ```
   trait TreeNode_v2: Sized {
       fn children<'a>(&mut self) -> &'a mut [Self];
   
       fn transform<F>(&mut self, op_up: &mut F, op_down: &mut F) -> Result<()>
       where
           F: FnMut(&mut Self) -> Result<()>,
       {
           op_down(self)?;
   
           let children_as_mut_ref = self.children();
           if !children_as_mut_ref.is_empty() {
               children_as_mut_ref
                   .iter_mut()
                   .map(|c| c.transform(op_up, op_down))
                   .collect::<Result<_>>()?;
           }
   
           op_up(self)?;
           Ok(())
       }
   }
   ```
   
   `op_up` and `op_down` may also be separated into 2 different functions. But 
the key concept is that the user only must implement how the node reaches its 
children, and the corresponding rule logic to apply while traversing the tree. 
It is also important that the whole tree must be created before the traversals 
with default payloads. Before the refactor, there were some places making 
on-the-fly children construction during the traversal.
   
   I don’t know yet how this design can be integrated with the existing 
TreeNode, but in many places, such a design makes things easier and more 
understandable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to