[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807372#comment-16807372
]
Julian Hyde commented on CALCITE-2973:
--------------------------------------
I agree. Merge join is well suited to joins with theta conditions that are
based on ranges. (Hash joins are good with equality but terrible with ranges.)
For example, consider the following query:
{code}select *
from orders
join shipments
on shipment.shipDate
between order.orderDate + interval '1' day
and order.orderDate + interval '2' day{code}
An efficient execution plan would sort {{orders}} on {{orderDate}} and
{{shipments}} on {{shipDate}} and merge {{orders}} against a 1-day range of
{{shipments}}. It is a generalization of merge join.
> Make EnumerableMergeJoinRule to support a theta join
> ----------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)