[ 
https://issues.apache.org/jira/browse/CALCITE-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009310#comment-17009310
 ] 

Jin Xing commented on CALCITE-3667:
-----------------------------------

Hi, [~julianhyde] ~
 Thanks again looking into this, and I really appreciate your help and shepherd 
~

The original implementation of MergeJoinEnumerator#advance(before CALCITE-3258) 
is as below
{code:java}
advance() {
      ... ...
      cartesians = Linq4j.product(
          ImmutableList.of(Linq4j.enumerator(lefts),
              Linq4j.enumerator(rights)));
      return true;
}
{code}
In which, matched keys from LHS and RHS are collected into *lefts* and 
*rights*, i.e. keys in *lefts* and *rights* are all the same. And then we 
generate cartesians from the rows of LHS and RHS with the same key.

After PR-1717, the change is as below:
{code:java}
advance() {
      ... ...
      cartesians = new ConditionalEnumerator<>(
          Linq4j.product(
              ImmutableList.of(Linq4j.enumerator(lefts), 
Linq4j.enumerator(rights))),
          v -> predicate.apply((TSource) v.get(0), (TInner) v.get(1)));
      return true;
}
{code}
In which, I wrapped a *ConditionalEnumerator* and filtering rows with 
*predicate*. Since I'm filtering based on rows with the same key each time, the 
complexity is decided by the duplication of the keys.

1. If all rows shares the single same key, yes, the complexity is O(m * n);
 2. If all rows have distinct keys, the complexity is O(m + n)

> EnumerableMergeJoin should not use  take-while enumerator
> ---------------------------------------------------------
>
>                 Key: CALCITE-3667
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3667
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Jin Xing
>            Assignee: Jin Xing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently EnumerableMergeJoin use take-while enumerator [1] to emit values 
> satisfy predicate. However take-while enumerator stops the enumeration at 
> once when predicate is failed, which is not correct and we should finish the 
> enumeration and extract all the qualified  values.
> [1] 
> https://github.com/apache/calcite/blob/master/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3896



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to