peter-toth commented on code in PR #8672:
URL: https://github.com/apache/arrow-datafusion/pull/8672#discussion_r1438363204
##########
datafusion/expr/src/tree_node/expr.rs:
##########
@@ -24,16 +24,13 @@ use crate::expr::{
};
use crate::{Expr, GetFieldAccess};
-use datafusion_common::tree_node::{TreeNode, VisitRecursion};
+use datafusion_common::tree_node::TreeNode;
use datafusion_common::{internal_err, DataFusionError, Result};
impl TreeNode for Expr {
- fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion>
- where
- F: FnMut(&Self) -> Result<VisitRecursion>,
- {
- let children = match self {
- Expr::Alias(Alias{expr,..})
+ fn children_nodes(&self) -> Vec<Self> {
Review Comment:
This PR is a nice refactor but I'm not sure it is needed at all (1.) and I'm
also not sure cloning a node's children into a `Vec` is good direction (2.).
Let me share my thoughts on these 2.
1. I think in DataFusion there are only 4 real tree types: `LogicalPlan`,
`Expr`, `ExecutionPlan` and `PhysicalExpr` and there 7 special tree types based
on the above 4: `SortPushDown`, `PlanWithRequitements`, `ExprOrdering`,
`OrderPreservationContext`, `PipelineStatePropagator`,
`PlanWithCorrespondingCoalescePartitions` and `PlanWithCorrespondingSort`.
The only reason why DataFusion currently has these 7 special trees is to be
able to propagate down/up some additional information during tree
transformations. But I think these special trees are simply not needed and can
be removed. If we add some transformation helper functions like
`TreeNode.transform_down_with_payload()` and
`TreeNode.transform_up_with_payload()` we can easily remove these 7 special
types completely.
https://github.com/apache/arrow-datafusion/issues/8663 is about this issue
and I removed `SortPushDown`, `PlanWithRequitements` and `ExprOrdering` as
exaples in this PR: https://github.com/apache/arrow-datafusion/pull/8664.
2. If we just focus on the 4 base tree types we can distingsuish 2 different
types: `Expr` uses `Box`es but the other 3 (`LogicalPlan`, `ExecutionPlan` and
`PhysicalExpr`) use `Arc`s for the links between the nodes. I think this is
very important in their behaviour. E.g. a big issue for `Expr`s when the nodes
are cloned as it basicaly means cloning the whole subtree under the node. But
cloning is not an issue for the other 3 as cloning `Arc`s is cheap. So I think
the problem with the current `Expr.apply_children()` and the
`Expr.children_nodes()` in this PR that it doesn't move the code into the a
direction where cloning is not necessary on trees made of `Box`es.
What I would suggest long term is in
https://github.com/apache/arrow-datafusion/pull/7942. Instead of the old
`TreeNode.apply_children()` (or collecting the children into a `Vec`) we could
use `TreeNode.visit_children()`
(https://github.com/apache/arrow-datafusion/pull/7942/files#diff-6515fda3c67b9d487ab491fd21c27e32c411a8cbee24725d737290d19c10c199R31).
IMO we need to implement `visit_children()` for the 4 base tree types only
(see 1.).
(Unfortunately I didn't noted the visit related changes in the
https://github.com/apache/arrow-datafusion/pull/7942 PR description, but it is
a big PR with many different ideas and focusing on transform seemed more
important.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]