Lordworms commented on issue #9576:
URL:
https://github.com/apache/arrow-datafusion/issues/9576#issuecomment-2009267298
> > Hey @mustafasrepo, the reason that the first plan added a new Projection
is that in the Rewriter it would mark the c3+c4 twice so that it judges the
expressions needed to add an extra Projection layer. However, here I got two
Problem and wish you could give me an answer. Currently, it seems like I have
two ways to implement this feature
> >
> > 1. I can directly go through all the expressions of a single plan and if
I find a BinaryOp then I just add another Projection upon the current plan with
sub query (easy solution)
> > 2. The second one is that we should track a DAG over the expression and
if it is referenced in another plan, we add an extra projection. (which I have
no idea how to properly trace them in treenode recursion), I don't want to do
another recursion.
> > Which one do you think is better?
>
> The current approach is we use projection to calculate a complex
expression if it is used at least twice (Otherwise projection deemed
unnecessary). Hence, first option wouldn't work in this case. The other
approach may work, however, it may place the projection in a sub-optimal spot.
As an example, consider following plan,
>
> ```
> Projection(a+b)
> --Filter (a+b=0),
> ----Sort(a+b ASC),
> ------TableScan(a,b)
> ```
>
> with second approach you might produce plan below (still better than
current behaviour. However, sub-optimal)
>
> ```
> Projection(`a+b`)
> --Filter (`a+b`=0),
> ----Projection (a+b as `a+b`)
> ------Sort(a+b ASC),
> --------TableScan(a,b)
> ```
>
> where I used `a+b` to distinguish it from binary expression `a+b`.
However, instead we could have generated following plan
>
> ```
> Projection(`a+b`)
> --Filter (`a+b`=0),
> ----Sort(`a+b` ASC),
> ------Projection (a+b as `a+b`)
> --------TableScan(a,b)
> ```
>
> Hence, I think best approach is to traverse plan from top to bottom and
keeping the cumulative complex expression counts in the plan. For plan below
>
> ```
> Projection(a+b)
> --Filter (a+b=0),
> ----Sort(a+b ASC),
> ------TableScan(a,b)
> ```
>
> This would produce
>
> ```
> Projection(a+b), ("a+b", count: 1)
> --Filter (a+b=0), ("a+b", count: 2)
> ----Sort(a+b ASC), ("a+b", count: 2)
> ------TableScan(a,b)
> ```
>
> Then after constructing, above tree. With a bottom-up traversal we can
generate following plan
>
> ```
> Projection(`a+b`)
> --Filter (`a+b`=0),
> ----Sort(`a+b` ASC),
> ------Projection (a+b as `a+b`)
> --------TableScan(a,b)
> ```
>
> by implacing projections to calculate common expression that are used more
than once by subsequent stages. However, I presume this would involve a lot of
work.
I got it, Thanks for your solutions. I plan to implement this today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]