[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820902#comment-16820902
]
Lai Zhou commented on CALCITE-2973:
-----------------------------------
[~zabetak], I can't find a good way to break a theta join into an equi-join +
filter/projection , I think it will also make the rules hard to understand.
But I found another simple and clear way , please see the latest commit
:[[https://github.com/apache/calcite/pull/1156/files]|[https://github.com/apache/calcite/pull/1156/files]]
We still keep the EquiJoin as a pure equil join without remain condition.
For a theta join, as Calcite defined in the EnumerableJoinRule,
{code:java}
!info.isEqui() && join.getJoinType() != JoinRelType.INNER{code}
if it has equi keys, we can use a hash-join or merge-join instead of
nested-loop-join to improve the performance .
So I introduced a new join rel named `EnumerableThetaHashJoin ` . In addition ,
I found there are some difference between algorithms of pure hash join and
hash join with remain condition :
When we implement a pure hash join , we just need to compare the hash join keys
, but when we implement a hash join with remain condition, we need to compare
some other columns to find the unmatched records.
So I introduced a new method named `thetaHashJoin` in EnumerableDefaults.
> Allow theta joins that have equi conditions to be executed using a hash join
> algorithm
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)