Hi,

I'm investigating how to consider relational expression reuse in Volcano cost 
model. CALCITE-481<https://jira.apache.org/jira/browse/CALCITE-481> is a good 
start point and the JIRA description mentions integer linear programming, whose 
compute complexity is too high in some cases. So I propose a simpler method, 
and let's discuss it here.

For instance, here's example a Union with two inputs.


UnionAll

    input(0) ==  Filter - Scan
    input(1) ==  Filter - Scan

In currently implementation the cumulative cost of Union is Cost(UnionAll) + 
Cost(Filter) * 2 + Cost(Scan) * 2. If filter and the scan is the same, the real 
cost is Cost(UnionAll) + Cost(Filter) + Cost(Scan). The key point is how to 
detect the reused expression. We could resolve it by recording the RelNodes 
used in cost calculating, and it's easy to find Filter and Scan is computed 
twice.

On one hand, the method should be faster than integer linear programming. On 
the other hand it may miss the "real" optimal plan because it considers only 
local cheapest cost in subset cost computing.

What do you think about the method? Comments are welcome. 🙂

Reply via email to