[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840194#comment-16840194
]
Ruben Quesada Lopez commented on CALCITE-2973:
----------------------------------------------
Digging into my previous comment, I believe that there is an alternative that
may do the job without breaking things (I think):
- EnumerableJoin will no longer extend EquiJoin.
- It will continue having only the original "condition" field (no need to add
remainCondition as a new field).
- The "condition" will now be any type of condition (equi / non-equi)
- EnumerableJoinRule will generate EnumerableThetaJoin (i.e. NestedLoopJoin)
for "pure non equi-joins"
- EnumerableJoinRule will generate EnumerableJoin (i.e. HashJoin, with possibly
extra predicate) for "pure and partial equi-joins"
- Inside EnumerableJoin#implement method, the "remainCondition" will be
calculated on the fly, using {{Join#analyzecondition}} (or {{JoinInfo#of}})
method. If the condition is pure equi, the remainCondition will be null; if the
condition is not pure equi, the remaining condition will be taken from
{{NonEquiJoinInfo.remaining}} and will be passed to the new
{{EnumerableJoin#generatePredicate}} method to create the extra predicate to be
passed to {{BuiltInMethod.HASH_JOIN.method}}, which can remain as it is right
now in the PR.
[~hhlai1990], I'm not sure if my explanation above is clear, let me know if you
have any questions or you see any issues on the logic behind.
> Allow theta joins that have equi conditions to be executed using a hash join
> algorithm
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.20.0
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)