Hi,
I'm investigating how to consider relational expression reuse in Volcano cost
model. CALCITE-481<https://jira.apache.org/jira/browse/CALCITE-481> is a good
start point and the JIRA description mentions integer linear programming, whose
compute complexity is too high in some cases. So I propose a simpler method,
and let's discuss it here.
For instance, here's example a Union with two inputs.
UnionAll
input(0) == Filter - Scan
input(1) == Filter - Scan
In currently implementation the cumulative cost of Union is Cost(UnionAll) +
Cost(Filter) * 2 + Cost(Scan) * 2. If filter and the scan is the same, the real
cost is Cost(UnionAll) + Cost(Filter) + Cost(Scan). The key point is how to
detect the reused expression. We could resolve it by recording the RelNodes
used in cost calculating, and it's easy to find Filter and Scan is computed
twice.
On one hand, the method should be faster than integer linear programming. On
the other hand it may miss the "real" optimal plan because it considers only
local cheapest cost in subset cost computing.
What do you think about the method? Comments are welcome. 🙂