Thx Ruben, very appreciate for your suggestions, I will make modifications soon.

The main diff it to move semiJoin physical implementations to 
EnumerableHashJoin, i also see that now we use method `contains` to probe the 
right side(inner loop), but there also 2 implementations for `contains`
One is Hash probe and one is loop look up, I’m not very sure it is indeed a 
hash-semi or nestedLoop-semi.

[1] 
https://github.com/apache/calcite/blob/b03cdc486cf5c7232bbc6fa9b5f02f564e9601c3/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1312

Best,
Danny Chan
在 2019年4月18日 +0800 AM1:01,Ruben Q L <[email protected]>,写道:
> Hi Danny,
> thanks for kicking off this big task, I have started to add some comments
> in the PR
>
> Best regards,
> Ruben
>
>
> Le lun. 15 avr. 2019 à 20:49, Julian Hyde <[email protected]> a écrit :
>
> > I haven’t reviewed the change, but let’s strike a balance between
> > backwards compatibility and progress. We should allow breaking changes if
> > they make a significant improvement. Of course we should mark APIs
> > deprecated for one or two releases, if possible.
> >
> > > On Apr 15, 2019, at 2:01 AM, Yuzhao Chen <[email protected]> wrote:
> > >
> > > I also have this question when i was doing this patch, cause Enumerable
> > nodes are physical operators(implementation) and should be specific to
> > Calcite. So in the beginning i didn’t classify them as public APIs and only
> > keep the constructors and methods for logical nodes.
> > >
> > > Well this searching is somewhat convincing, there are many
> > interfaces/methods are marked as deprecated and to be removed before 2.0,
> > it would be better if the principles for backward compatibility are more
> > clear.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月14日 +0800 PM7:41,Hongze Zhang <[email protected]>,写道:
> > > > I didn't take look on the PR in detail so far, but it seems that a
> > topic about backward compatibility would be worth to discuss anyway.
> > > >
> > > > Regarding the Enumerable's API, I don't think there are many use cases
> > of them from Calcite users. Although users may create instances in some
> > custom rules, or extend the Enumerable rels' classes to implement some
> > specific behaviors, I am still not sure if such cases are that usual.
> > > >
> > > > For example, I've run a Google search for term "EnumerableJoin.create"
> > on github.com, only 2 results returned[1], and both are from
> > apache/calcite project. Similar result on "EnumerableCorrelate.create". I
> > am pretty sure that Google could not give a precise result about code usage
> > (I don't find a way to search these terms using GitHub code search), but at
> > least it shows some sort of trend. As a comparison there are 33 results[2]
> > for "LogicalJoin.create", some are from external projects.
> > > >
> > > > So my question would be: how much backward compatibility should we
> > respect when we make API changes to Calcite? To me it is not much clear. I
> > know compatibility is very, very important for an Apache project (see "The
> > Apache Project Maturity Model/QU40"[3]), but I am not sure if we should add
> > "@Depracated" to any changed public staffs, the code will be messy and hard
> > to understand.
> > > >
> > > > Anyway my example about EnumerableJoin/Correlate just shows my
> > confusion on a broader topic. So I will be +1 to the consensus that already
> > be achieved so far. But I'll be happy to hear more principles on how to
> > manage the backward compatibility for Calcite, such as: what's the
> > definition about Calcite's public API, or what changes would be considered
> > backward-incompatible, etc. I think that will also benefit our developers a
> > lot.
> > > >
> > > >
> > > > Best,
> > > > Hongze
> > > >
> > > >
> > > > [1]
> > https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
> > > > [2]
> > https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
> > > > [3]
> > https://community.apache.org/apache-way/apache-project-maturity-model.html#quality
> > > >
> > > > > On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa 
> > > > > <[email protected]>
> > wrote:
> > > > >
> > > > > Agreed, but not sure what would the best way to do it be without
> > > > > making the code very confusing.
> > > > >
> > > > > On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <[email protected]>
> > wrote:
> > > > > >
> > > > > > I share the same concern with you.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks~
> > > > > > Haisheng
> > Yuan------------------------------------------------------------------
> > > > > > 发件人:Stamatis Zampetakis<[email protected]>
> > > > > > 日 期:2019年04月14日 05:37:29
> > > > > > 收件人:<[email protected]>
> > > > > > 主 题:Re: Join, SemiJoin, Correlate
> > > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > Thanks a lot for taking this on, it is a great start!
> > > > > >
> > > > > > I didn't look thoroughly through the PR but I noticed that there are
> > many
> > > > > > renaming/refactoring of public APIs. I am not sure if we should
> > introduce
> > > > > > so many breaking changes without prior notice. A most conservative
> > approach
> > > > > > would be to keep existing classes/methods, mark them as deprecated,
> > and
> > > > > > then remove them in one of the coming releases. I am not sure if 
> > > > > > that
> > is
> > > > > > the right way to go so let's see what the others have to say.
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > > > On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis,
> > > > > > > @Walaa Eldin Moustafa
> > > > > > >
> > > > > > > I have did the work for this discussion, and look forward to your
> > > > > > > suggestions.
> > > > > > >
> > > > > > >
> > > > > > > ### Diff
> > > > > > > - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType,
> > > > > > > EnumerableSemiJoinRule, EnumerableThetaJoin
> > > > > > > - Make EnumerableMergeJoin extends Join instead of EquiJoin
> > > > > > > - Add SEMI and ANTI join type to JoinRelType, add method
> > > > > > > returnsJustFirstInput() to decide if the join only outputs left 
> > > > > > > side
> > > > > > > - Correlate use JoinRelType instead of SemiJoinType
> > > > > > > - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make 
> > > > > > > it
> > > > > > > exptends Join instead of Correlate
> > > > > > > - Rename EnumerableJoin to EnumerableHashJoin
> > > > > > > - EnumerableJoinRule will convert semi-join to
> > EnumerableNestedLoopJoin
> > > > > > > (EnumerableSemiJoin's function is merged into this rule)
> > > > > > > - Add method isNonCorrelateSemiJoin() in Join.java to make sure if
> > this
> > > > > > > join is a semi-join (Comes from SemiJoinRule) or comes from
> > > > > > > decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns
> > value
> > > > > > > true means the join is a semi-join equivalent to SemiJoin before
> > this patch.
> > > > > > > - Cache the JoinInfo in Join and use it to get leftKeys and
> > rightKeys,
> > > > > > > merge the SemiJoin#computeSelfCost to Join#computeSelfCost
> > > > > > > - RelBuilder removes SemiJoinFactory, method #semiJoin now return 
> > > > > > > a
> > > > > > > LogicalJoin with JoinRelType#SEMI
> > > > > > >
> > > > > > > ### Rules tweak
> > > > > > > - JoinAddRedundantSemiJoinRule now create LogicalJoin with
> > > > > > > JoinRelType#SEMI instead of SemiJoin
> > > > > > > - JoinToCorrelateRule remove SEMI instance and change the matchs
> > condition
> > > > > > > to !join.getJoinType().generatesNullsOnLeft() which also allowed 
> > > > > > > ANTI
> > > > > > > compared before this patch.
> > > > > > > - SemiJoinRule match SEMI join specificlly
> > > > > > >
> > > > > > > ### Metadata tweak
> > > > > > > - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to
> > > > > > > getAllPredicates(Join) cause semi-join only outputs one side
> > > > > > > - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount,
> > > > > > > RelMdSize, RelMdUniqueKeys: merge semi-join logic to join
> > > > > > >
> > > > > > >
> > > > > > > ### Test cases change
> > > > > > > - MaterializationTest#testJoinMaterialization11 now can 
> > > > > > > materialize
> > > > > > > successfully, cause i allow logical SemiJoin node to match, the
> > original
> > > > > > > matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i 
> > > > > > > think
> > is
> > > > > > > wrong cause this will only matches subClasses of SemiJoin which is
> > only
> > > > > > > EnumerableSemiJoin before this patch.
> > > > > > > - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because
> > > > > > > CALCITE-2018, the final EnumerableSort's cost was cache by the
> > previous
> > > > > > > EnumerableSort with logical childs, so i remove the
> > EnumerableSortRule and
> > > > > > > the best plan is correct
> > > > > > > - sub-query.iq has better plan for null correlate
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2019年3月21日 +0800 AM3:07,Julian Hyde <[email protected]>,写道:
> > > > > > > > I just discovered that Correlate, which is neither a Join nor a
> > > > > > > SemiJoin, uses SemiJoinType, but SemiJoin does not use 
> > > > > > > SemiJoinType.
> > > > > > > >
> > > > > > > > Yuck. The Join/SemiJoin/Correlate type hierarchy needs some 
> > > > > > > > thought.
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> >

Reply via email to