Hi Danny, thanks for kicking off this big task, I have started to add some comments in the PR
Best regards, Ruben Le lun. 15 avr. 2019 à 20:49, Julian Hyde <[email protected]> a écrit : > I haven’t reviewed the change, but let’s strike a balance between > backwards compatibility and progress. We should allow breaking changes if > they make a significant improvement. Of course we should mark APIs > deprecated for one or two releases, if possible. > > > On Apr 15, 2019, at 2:01 AM, Yuzhao Chen <[email protected]> wrote: > > > > I also have this question when i was doing this patch, cause Enumerable > nodes are physical operators(implementation) and should be specific to > Calcite. So in the beginning i didn’t classify them as public APIs and only > keep the constructors and methods for logical nodes. > > > > Well this searching is somewhat convincing, there are many > interfaces/methods are marked as deprecated and to be removed before 2.0, > it would be better if the principles for backward compatibility are more > clear. > > > > Best, > > Danny Chan > > 在 2019年4月14日 +0800 PM7:41,Hongze Zhang <[email protected]>,写道: > >> I didn't take look on the PR in detail so far, but it seems that a > topic about backward compatibility would be worth to discuss anyway. > >> > >> Regarding the Enumerable's API, I don't think there are many use cases > of them from Calcite users. Although users may create instances in some > custom rules, or extend the Enumerable rels' classes to implement some > specific behaviors, I am still not sure if such cases are that usual. > >> > >> For example, I've run a Google search for term "EnumerableJoin.create" > on github.com, only 2 results returned[1], and both are from > apache/calcite project. Similar result on "EnumerableCorrelate.create". I > am pretty sure that Google could not give a precise result about code usage > (I don't find a way to search these terms using GitHub code search), but at > least it shows some sort of trend. As a comparison there are 33 results[2] > for "LogicalJoin.create", some are from external projects. > >> > >> So my question would be: how much backward compatibility should we > respect when we make API changes to Calcite? To me it is not much clear. I > know compatibility is very, very important for an Apache project (see "The > Apache Project Maturity Model/QU40"[3]), but I am not sure if we should add > "@Depracated" to any changed public staffs, the code will be messy and hard > to understand. > >> > >> Anyway my example about EnumerableJoin/Correlate just shows my > confusion on a broader topic. So I will be +1 to the consensus that already > be achieved so far. But I'll be happy to hear more principles on how to > manage the backward compatibility for Calcite, such as: what's the > definition about Calcite's public API, or what changes would be considered > backward-incompatible, etc. I think that will also benefit our developers a > lot. > >> > >> > >> Best, > >> Hongze > >> > >> > >> [1] > https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com > >> [2] > https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com > >> [3] > https://community.apache.org/apache-way/apache-project-maturity-model.html#quality > >> > >>> On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa <[email protected]> > wrote: > >>> > >>> Agreed, but not sure what would the best way to do it be without > >>> making the code very confusing. > >>> > >>> On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <[email protected]> > wrote: > >>>> > >>>> I share the same concern with you. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Thanks~ > >>>> Haisheng > Yuan------------------------------------------------------------------ > >>>> 发件人:Stamatis Zampetakis<[email protected]> > >>>> 日 期:2019年04月14日 05:37:29 > >>>> 收件人:<[email protected]> > >>>> 主 题:Re: Join, SemiJoin, Correlate > >>>> > >>>> Hi Danny, > >>>> > >>>> Thanks a lot for taking this on, it is a great start! > >>>> > >>>> I didn't look thoroughly through the PR but I noticed that there are > many > >>>> renaming/refactoring of public APIs. I am not sure if we should > introduce > >>>> so many breaking changes without prior notice. A most conservative > approach > >>>> would be to keep existing classes/methods, mark them as deprecated, > and > >>>> then remove them in one of the coming releases. I am not sure if that > is > >>>> the right way to go so let's see what the others have to say. > >>>> > >>>> Best, > >>>> Stamatis > >>>> > >>>> On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <[email protected]> > wrote: > >>>> > >>>>> Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis, > >>>>> @Walaa Eldin Moustafa > >>>>> > >>>>> I have did the work for this discussion, and look forward to your > >>>>> suggestions. > >>>>> > >>>>> > >>>>> ### Diff > >>>>> - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType, > >>>>> EnumerableSemiJoinRule, EnumerableThetaJoin > >>>>> - Make EnumerableMergeJoin extends Join instead of EquiJoin > >>>>> - Add SEMI and ANTI join type to JoinRelType, add method > >>>>> returnsJustFirstInput() to decide if the join only outputs left side > >>>>> - Correlate use JoinRelType instead of SemiJoinType > >>>>> - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it > >>>>> exptends Join instead of Correlate > >>>>> - Rename EnumerableJoin to EnumerableHashJoin > >>>>> - EnumerableJoinRule will convert semi-join to > EnumerableNestedLoopJoin > >>>>> (EnumerableSemiJoin's function is merged into this rule) > >>>>> - Add method isNonCorrelateSemiJoin() in Join.java to make sure if > this > >>>>> join is a semi-join (Comes from SemiJoinRule) or comes from > >>>>> decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns > value > >>>>> true means the join is a semi-join equivalent to SemiJoin before > this patch. > >>>>> - Cache the JoinInfo in Join and use it to get leftKeys and > rightKeys, > >>>>> merge the SemiJoin#computeSelfCost to Join#computeSelfCost > >>>>> - RelBuilder removes SemiJoinFactory, method #semiJoin now return a > >>>>> LogicalJoin with JoinRelType#SEMI > >>>>> > >>>>> ### Rules tweak > >>>>> - JoinAddRedundantSemiJoinRule now create LogicalJoin with > >>>>> JoinRelType#SEMI instead of SemiJoin > >>>>> - JoinToCorrelateRule remove SEMI instance and change the matchs > condition > >>>>> to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI > >>>>> compared before this patch. > >>>>> - SemiJoinRule match SEMI join specificlly > >>>>> > >>>>> ### Metadata tweak > >>>>> - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to > >>>>> getAllPredicates(Join) cause semi-join only outputs one side > >>>>> - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount, > >>>>> RelMdSize, RelMdUniqueKeys: merge semi-join logic to join > >>>>> > >>>>> > >>>>> ### Test cases change > >>>>> - MaterializationTest#testJoinMaterialization11 now can materialize > >>>>> successfully, cause i allow logical SemiJoin node to match, the > original > >>>>> matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think > is > >>>>> wrong cause this will only matches subClasses of SemiJoin which is > only > >>>>> EnumerableSemiJoin before this patch. > >>>>> - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because > >>>>> CALCITE-2018, the final EnumerableSort's cost was cache by the > previous > >>>>> EnumerableSort with logical childs, so i remove the > EnumerableSortRule and > >>>>> the best plan is correct > >>>>> - sub-query.iq has better plan for null correlate > >>>>> > >>>>> > >>>>> > >>>>> Best, > >>>>> Danny Chan > >>>>> 在 2019年3月21日 +0800 AM3:07,Julian Hyde <[email protected]>,写道: > >>>>>> I just discovered that Correlate, which is neither a Join nor a > >>>>> SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType. > >>>>>> > >>>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. > >>>>>> > >>>>>> Julian > >>>>>> > >>>>>> > >>>>> > >>>> > >> > >
