peter-toth commented on PR #8835:
URL:
https://github.com/apache/arrow-datafusion/pull/8835#issuecomment-1893641606
Let me give a bit more details to this issue and it might help to decide how
to proceed with the `Transformed` enum.
1. Probably we all agree that currently `Transformed` has no point. IMO we
should either remove it or make use of it / fix it.
2. I can think of 2 uses of this enum:
1. During transform/rewrite to speed up transformation by avoiding deep
compare of nodes.
The rationale here is that we can avoid deep compare of old vs.
transformed children if the `Transformed` state of children is propagated up to
the current node. The comparison would be needed to avoid allocating new nodes
if no change was made to the node's subtree.
2. Return it to the caller to be able to define the above rule sequence:
> `if rule A is applied, then return new plan; else if rule B is
applied, then return new plan; else return original plan`
By "rule" I'm not refering to the current `OptimizerRule` because the
current `Optimizer` seems to be an "alternative" to the `TreeNode`
transform/rewrite functions. `Optimizer` / `OptimizerRule`s are also capable
to transform a `LogicalPlan` tree top-down or bottom-up way, but they simply
don't use transform / rewrite.
I don't know if there is long term goal to change `Optimizer` and
rewrite `OptimizerRule`s using `TreeNode` functions in the future, but the
`Transformed` enum doesn't affect this part of `DataFusion` now.
So by "rule" I'm just referring to subsequent `TreeNode` transform /
rewrite calls that need the information if any change was made to the tree by
the previous call.
(i.) is not fulfilled currently. This is because the `transform()` and
`rewrite()` doesn't propagate up the the `Transformed` enum. I.e. `transform()`
/ `rewrite()` could return `Result<Transformed<Self>>` or `Result<(Transformed,
Self)>` (instead of the current `Result<Self>`) so that `map_children()` could
use it to make decisions if any of the childrens were transformed to avoid old
/ new children deep comparison. But if we want to do that we need to change
`transform()` and `rewrite()` signature.
BTW if `transform()` and `rewrite()` returned with
`Result<Transformed<Self>>` then it would solve (ii.) as well.
3. But If we dig deeper there are 3 categories of trees and it seems not all
of them needs to avoid deep compare of nodes.
- `DynTreeNode`s (`ExecutionPlan`, `PhysicalExpr` and derived trees) that
have immutable `Arc<dyn ...>` nodes:
I think this category is to most simple as these trees don't need the
`TransformedEnum` to check if any of the children was transformed and never do
deep comparison of nodes. This is because the immutable nodes are considered
equal only if they point to same allocation. This is already done by
`with_new_children_if_necessary`:
https://github.com/apache/arrow-datafusion/blob/5433b52e93ec1031cf08d73d20556421f604a1e0/datafusion/physical-plan/src/lib.rs#L477-L494
called from `map_children()`, so no deep compare in this case now.
- `Expr` that uses `Box`es as links between nodes:
I think this is also simple because `Expr` uses `Box`es so a node owns
it subtree and the tree is mutable. The best that `map_children()` can do is to
mutate the nodes in place. I.e. a transformed node doesn't require newly
allocated ancestor nodes. That's why `Expr::map_children()` don't need to do
any comparison.
- `LogicalPlan` that uses `Arc`s as links between nodes:
This looks like we can have deep compare here now:
https://github.com/apache/arrow-datafusion/blob/5433b52e93ec1031cf08d73d20556421f604a1e0/datafusion/expr/src/tree_node/plan.rs#L111-L119
Though `Arc` comparison is short circuited to equal if 2 `Arc`s point to the
same allocation, we can have a costly deep compare up in a node if a deep tree
was modified at a leaf node. But I don't think this is neccessary. We could
implement a special compare function for `LogicalPlan` that treats `Arc`s equal
only if they point to the same allocation, similarly to what we do in
`DynTreeNode`s.
So my point is that:
- for (i.) we don't necessarily need the `Transformed` enum
- and for (ii.) we can use such closures that modify a outer flag to detect
if any change was made to a tree during transform.
BTW, if you think that keeping the `Transformed` still is preferred then I
would like to change this PR to make use of it and change
`TreeNode::transform()` / `TreeNode::rewrite()` signatures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]