[
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838246#comment-16838246
]
Lai Zhou commented on CALCITE-2973:
-----------------------------------
[~zabetak], the query as you said,
{code:java}
SELECT e.name FROM emp e INNER JOIN department d ON e.address.zipcode =
d.zipcode{code}
I add a test for it, and I found the RexFieldAccess `e.address.zipcode` would
be converted to a new RexInputRef , that was made by JoinPushExpressionsRule,
see
[https://github.com/apache/calcite/blob/6afa38bae794462e6e250237a1b60cc4220b2885/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L3290].
Please see the latest commit, there's a test named
`leftOuterJoinWithPredicateContainsRexFieldAccess` in EnumerableJoinTest.
I admit the rule based approach you proposed is also good for this issue. But I
still think it's a little complicated, and it seems to increase the overhead of
computation if we introduce a new projection.
> Allow theta joins that have equi conditions to be executed using a hash join
> algorithm
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-2973
> URL: https://issues.apache.org/jira/browse/CALCITE-2973
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.20.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query for a large dataset (such as 10000*10000),
> the nested-loop join process will take dozens of time than the sort-merge
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will
> improve the performance greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)