[ 
https://issues.apache.org/jira/browse/CALCITE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008724#comment-17008724
 ] 

Ruben Q L edited comment on CALCITE-3671 at 1/6/20 10:56 AM:
-------------------------------------------------------------

Yes, we can do that, but in any case (c) could happen if someone creates their 
own EnumerableHashJoinRule, or if we ever carry out CALCITE-3585. That is the 
reason for the current ticket: even if a complete non-equi HashJoin might be 
generated, its cost should be accordingly increased so that another (cheaper) 
option (e.g. NestedLoopJoin) shall be taken.


was (Author: rubenql):
Yes, we can do that, but in any case (c) could happen if someone creates their 
own EnumerableHashJoinRule, or if we carry out CALCITE-3585, that is the reason 
for the current ticket: even if a complete non-equi HashJoin might be 
generated, its cost should be accordingly increased so that another (cheaper) 
option (e.g. NestedLoopJoin) shall be taken.

> Join cost computation should consider join condition (equi vs non-equi)
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-3671
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3671
>             Project: Calcite
>          Issue Type: Improvement
>    Affects Versions: 1.21.0
>            Reporter: Ruben Q L
>            Priority: Major
>
> In some Join algorithms, the actual cost of performing the join would depend 
> on whether or not the join conditions is an equi-join or not, therefore 
> computeSelfCost should reflect that.
> This would be the case for example of HashJoin (which now supports all type 
> of join condition, see CALCITE-2973) or MergeJoin (idem, CALCITE-3285).
> To sump up, we can have three different scenarios:
> a) The condition is a "complete equi-join condition"; this is the best case 
> scenario, the join is performed purely on a hash/merge based algorithm and no 
> extra predicate is required.
> b) The condition is a "partial equi-join conditiom", i.e. the condition 
> contains some equi-join items, but also some non-equi-join items; in this 
> case the join is performed on a hash/merge based algorithm (for the equi-join 
> items) + an extra predicate (for the non-equi-join ones).
> c) The join condition is a "complete non-equi-join-condition", i.e. there are 
> no equi-join elements to build a hash/merge based solution, so the algorithm 
> is performed based on a predicate which evaluates the whole condition. This 
> is the worst-case scenario, since the Hash/Merge Join actually behaves as a 
> kind of de-facto nested loop join.
> Currently, since the condition nature is not evaluated in the 
> computeSelfCost, cases a-b-c would have an equivalent cost; we should reflect 
> somehow that: cost a < cost b < cost c



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to