[
https://issues.apache.org/jira/browse/CALCITE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008718#comment-17008718
]
Stamatis Zampetakis commented on CALCITE-3671:
----------------------------------------------
Thanks for the clarification. Apart from the changes to handle cases (a) and
(b) maybe we should put some assertions to make sure that (c) will never appear
even in the future.
> Join cost computation should consider join condition (equi vs non-equi)
> -----------------------------------------------------------------------
>
> Key: CALCITE-3671
> URL: https://issues.apache.org/jira/browse/CALCITE-3671
> Project: Calcite
> Issue Type: Improvement
> Affects Versions: 1.21.0
> Reporter: Ruben Q L
> Priority: Major
>
> In some Join algorithms, the actual cost of performing the join would depend
> on whether or not the join conditions is an equi-join or not, therefore
> computeSelfCost should reflect that.
> This would be the case for example of HashJoin (which now supports all type
> of join condition, see CALCITE-2973) or MergeJoin (idem, CALCITE-3285).
> To sump up, we can have three different scenarios:
> a) The condition is a "complete equi-join condition"; this is the best case
> scenario, the join is performed purely on a hash/merge based algorithm and no
> extra predicate is required.
> b) The condition is a "partial equi-join conditiom", i.e. the condition
> contains some equi-join items, but also some non-equi-join items; in this
> case the join is performed on a hash/merge based algorithm (for the equi-join
> items) + an extra predicate (for the non-equi-join ones).
> c) The join condition is a "complete non-equi-join-condition", i.e. there are
> no equi-join elements to build a hash/merge based solution, so the algorithm
> is performed based on a predicate which evaluates the whole condition. This
> is the worst-case scenario, since the Hash/Merge Join actually behaves as a
> kind of de-facto nested loop join.
> Currently, since the condition nature is not evaluated in the
> computeSelfCost, cases a-b-c would have an equivalent cost; we should reflect
> somehow that: cost a < cost b < cost c
--
This message was sent by Atlassian Jira
(v8.3.4#803005)