Ruben Q L created CALCITE-3671:
----------------------------------

             Summary: Join cost computation should consider join condition 
(equi vs non-equi)
                 Key: CALCITE-3671
                 URL: https://issues.apache.org/jira/browse/CALCITE-3671
             Project: Calcite
          Issue Type: Improvement
    Affects Versions: 1.21.0
            Reporter: Ruben Q L


In some Join algorithms, the actual cost of performing the join would depend on 
whether or not the join conditions is an equi-join or not, therefore 
computeSelfCost should reflect that.
This would be the case for example of HashJoin (which now supports all type of 
join condition, see CALCITE-2973) or MergeJoin (idem, CALCITE-3285).
To sump up, we can have three different scenarios:

a) The condition is a "complete equi-join condition"; this is the best case 
scenario, the join is performed purely on a hash/merge based algorithm and no 
extra predicate is required.
b) The condition is a "partial equi-join conditiom", i.e. the condition 
contains some equi-join items, but also some non-equi-join items; in this case 
the join is performed on a hash/merge based algorithm (for the equi-join items) 
+ an extra predicate (for the non-equi-join ones).
c) The join condition is a "complete non-equi-join-condition", i.e. there are 
no equi-join elements to build a hash/merge based solution, so the algorithm is 
performed based on a predicate which evaluates the whole condition. This is the 
worst-case scenario, since the Hash/Merge Join actually behaves as a kind of 
de-facto nested loop join.

Currently, since the condition nature is not evaluated in the computeSelfCost, 
cases a-b-c would have an equivalent cost; we should reflect somehow that: cost 
a < cost b < cost c



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to