[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820902#comment-16820902
 ] 

Lai Zhou commented on CALCITE-2973:
-----------------------------------

[~zabetak], I can't find a good way to break a theta join into an equi-join + 
filter/projection , I think it will also make the rules hard to understand.

But I found another simple and clear way , please see the latest commit 
:[[https://github.com/apache/calcite/pull/1156/files]|[https://github.com/apache/calcite/pull/1156/files]]

We still keep the EquiJoin as a pure equil join without remain condition.

For a theta join, as Calcite defined in the EnumerableJoinRule,
{code:java}
!info.isEqui() && join.getJoinType() != JoinRelType.INNER{code}
 

if it has equi keys, we can use a hash-join or merge-join instead of 
nested-loop-join to improve the performance .

So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition , 
I found there are some difference  between algorithms of pure hash join and 
hash join with remain condition :

When we implement a pure hash join , we just need to compare the hash join keys 
, but when we implement a hash join with remain condition, we need to compare 
some other columns to find the unmatched records.

So I introduced a new method named `thetaHashJoin` in EnumerableDefaults.

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to