berkaysynnada commented on issue #8663:
URL:
https://github.com/apache/arrow-datafusion/issues/8663#issuecomment-1871114495
We have also observed the complex API and different implementation
approaches of TreeNode. I'm not sure if this is the right place to share these
but, let me share our findings and a simple POC of how we can have a new trait
providing a simpler and more understandable API:
Since the general approach in Datafusion tree structs involves having the
children nodes within the self node, a pure `transform_down` cannot be
performed. Even if the nodes are applied by the given `op()` in pre-order, the
build order is always `bottom-up`. Inserting some logic into that build process
(into `map_children()`) can introduce hidden `transform-up` functionality. For
`transform-up()`s, the same flexibility may also cause the spread of a rule's
logic into different parts of the code, going against code integrity.
There are also some unused functionalities and outcome types in general, so
I think that a more simple API is enough usually. I scribbled a little and came
up with something like this, sharing it here in case it's useful.
```
trait TreeNode_v2: Sized {
fn children<'a>(&mut self) -> &'a mut [Self];
fn transform<F>(&mut self, op_up: &mut F, op_down: &mut F) -> Result<()>
where
F: FnMut(&mut Self) -> Result<()>,
{
op_down(self)?;
let children_as_mut_ref = self.children();
if !children_as_mut_ref.is_empty() {
children_as_mut_ref
.iter_mut()
.map(|c| c.transform(op_up, op_down))
.collect::<Result<_>>()?;
}
op_up(self)?;
Ok(())
}
}
```
`op_up` and `op_down` may also be separated into 2 different functions. But
the key concept is that the user only must implement how the node reaches its
children, and the corresponding rule logic to apply while traversing the tree.
It is also important that the whole tree must be created before the traversals
with default payloads. Before the refactor, there were some places making
on-the-fly children construction during the traversal.
I don’t know yet how this design can be integrated with the existing
TreeNode, but in many places, such a design makes things easier and more
understandable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]