Are you suggesting this only for UNION? I assume not, but I'm not sure I see how the approach you're suggesting will generalize. Of course, we also need to consider *how* to reuse the results of the expression.
The approach sounds like it could be reasonable, but I don't fully understand what you're proposing. In any case, I'm happy to see the discussion starting again on this! -- Michael Mior [email protected] Le mar. 26 mars 2019 à 08:02, Roger Shi <[email protected]> a écrit : > Hi, > > I'm investigating how to consider relational expression reuse in Volcano > cost model. CALCITE-481<https://jira.apache.org/jira/browse/CALCITE-481> > is a good start point and the JIRA description mentions integer linear > programming, whose compute complexity is too high in some cases. So I > propose a simpler method, and let's discuss it here. > > For instance, here's example a Union with two inputs. > > > UnionAll > > input(0) == Filter - Scan > input(1) == Filter - Scan > > In currently implementation the cumulative cost of Union is Cost(UnionAll) > + Cost(Filter) * 2 + Cost(Scan) * 2. If filter and the scan is the same, > the real cost is Cost(UnionAll) + Cost(Filter) + Cost(Scan). The key point > is how to detect the reused expression. We could resolve it by recording > the RelNodes used in cost calculating, and it's easy to find Filter and > Scan is computed twice. > > On one hand, the method should be faster than integer linear programming. > On the other hand it may miss the "real" optimal plan because it considers > only local cheapest cost in subset cost computing. > > What do you think about the method? Comments are welcome. 🙂 >
