I haven’t reviewed the change, but let’s strike a balance between backwards compatibility and progress. We should allow breaking changes if they make a significant improvement. Of course we should mark APIs deprecated for one or two releases, if possible.
> On Apr 15, 2019, at 2:01 AM, Yuzhao Chen <[email protected]> wrote: > > I also have this question when i was doing this patch, cause Enumerable nodes > are physical operators(implementation) and should be specific to Calcite. So > in the beginning i didn’t classify them as public APIs and only keep the > constructors and methods for logical nodes. > > Well this searching is somewhat convincing, there are many interfaces/methods > are marked as deprecated and to be removed before 2.0, it would be better if > the principles for backward compatibility are more clear. > > Best, > Danny Chan > 在 2019年4月14日 +0800 PM7:41,Hongze Zhang <[email protected]>,写道: >> I didn't take look on the PR in detail so far, but it seems that a topic >> about backward compatibility would be worth to discuss anyway. >> >> Regarding the Enumerable's API, I don't think there are many use cases of >> them from Calcite users. Although users may create instances in some custom >> rules, or extend the Enumerable rels' classes to implement some specific >> behaviors, I am still not sure if such cases are that usual. >> >> For example, I've run a Google search for term "EnumerableJoin.create" on >> github.com, only 2 results returned[1], and both are from apache/calcite >> project. Similar result on "EnumerableCorrelate.create". I am pretty sure >> that Google could not give a precise result about code usage (I don't find a >> way to search these terms using GitHub code search), but at least it shows >> some sort of trend. As a comparison there are 33 results[2] for >> "LogicalJoin.create", some are from external projects. >> >> So my question would be: how much backward compatibility should we respect >> when we make API changes to Calcite? To me it is not much clear. I know >> compatibility is very, very important for an Apache project (see "The Apache >> Project Maturity Model/QU40"[3]), but I am not sure if we should add >> "@Depracated" to any changed public staffs, the code will be messy and hard >> to understand. >> >> Anyway my example about EnumerableJoin/Correlate just shows my confusion on >> a broader topic. So I will be +1 to the consensus that already be achieved >> so far. But I'll be happy to hear more principles on how to manage the >> backward compatibility for Calcite, such as: what's the definition about >> Calcite's public API, or what changes would be considered >> backward-incompatible, etc. I think that will also benefit our developers a >> lot. >> >> >> Best, >> Hongze >> >> >> [1] >> https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com >> [2] >> https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com >> [3] >> https://community.apache.org/apache-way/apache-project-maturity-model.html#quality >> >>> On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa <[email protected]> >>> wrote: >>> >>> Agreed, but not sure what would the best way to do it be without >>> making the code very confusing. >>> >>> On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <[email protected]> >>> wrote: >>>> >>>> I share the same concern with you. >>>> >>>> >>>> >>>> >>>> >>>> Thanks~ >>>> Haisheng >>>> Yuan------------------------------------------------------------------ >>>> 发件人:Stamatis Zampetakis<[email protected]> >>>> 日 期:2019年04月14日 05:37:29 >>>> 收件人:<[email protected]> >>>> 主 题:Re: Join, SemiJoin, Correlate >>>> >>>> Hi Danny, >>>> >>>> Thanks a lot for taking this on, it is a great start! >>>> >>>> I didn't look thoroughly through the PR but I noticed that there are many >>>> renaming/refactoring of public APIs. I am not sure if we should introduce >>>> so many breaking changes without prior notice. A most conservative approach >>>> would be to keep existing classes/methods, mark them as deprecated, and >>>> then remove them in one of the coming releases. I am not sure if that is >>>> the right way to go so let's see what the others have to say. >>>> >>>> Best, >>>> Stamatis >>>> >>>> On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <[email protected]> wrote: >>>> >>>>> Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis, >>>>> @Walaa Eldin Moustafa >>>>> >>>>> I have did the work for this discussion, and look forward to your >>>>> suggestions. >>>>> >>>>> >>>>> ### Diff >>>>> - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType, >>>>> EnumerableSemiJoinRule, EnumerableThetaJoin >>>>> - Make EnumerableMergeJoin extends Join instead of EquiJoin >>>>> - Add SEMI and ANTI join type to JoinRelType, add method >>>>> returnsJustFirstInput() to decide if the join only outputs left side >>>>> - Correlate use JoinRelType instead of SemiJoinType >>>>> - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it >>>>> exptends Join instead of Correlate >>>>> - Rename EnumerableJoin to EnumerableHashJoin >>>>> - EnumerableJoinRule will convert semi-join to EnumerableNestedLoopJoin >>>>> (EnumerableSemiJoin's function is merged into this rule) >>>>> - Add method isNonCorrelateSemiJoin() in Join.java to make sure if this >>>>> join is a semi-join (Comes from SemiJoinRule) or comes from >>>>> decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns value >>>>> true means the join is a semi-join equivalent to SemiJoin before this >>>>> patch. >>>>> - Cache the JoinInfo in Join and use it to get leftKeys and rightKeys, >>>>> merge the SemiJoin#computeSelfCost to Join#computeSelfCost >>>>> - RelBuilder removes SemiJoinFactory, method #semiJoin now return a >>>>> LogicalJoin with JoinRelType#SEMI >>>>> >>>>> ### Rules tweak >>>>> - JoinAddRedundantSemiJoinRule now create LogicalJoin with >>>>> JoinRelType#SEMI instead of SemiJoin >>>>> - JoinToCorrelateRule remove SEMI instance and change the matchs condition >>>>> to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI >>>>> compared before this patch. >>>>> - SemiJoinRule match SEMI join specificlly >>>>> >>>>> ### Metadata tweak >>>>> - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to >>>>> getAllPredicates(Join) cause semi-join only outputs one side >>>>> - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount, >>>>> RelMdSize, RelMdUniqueKeys: merge semi-join logic to join >>>>> >>>>> >>>>> ### Test cases change >>>>> - MaterializationTest#testJoinMaterialization11 now can materialize >>>>> successfully, cause i allow logical SemiJoin node to match, the original >>>>> matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think is >>>>> wrong cause this will only matches subClasses of SemiJoin which is only >>>>> EnumerableSemiJoin before this patch. >>>>> - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because >>>>> CALCITE-2018, the final EnumerableSort's cost was cache by the previous >>>>> EnumerableSort with logical childs, so i remove the EnumerableSortRule and >>>>> the best plan is correct >>>>> - sub-query.iq has better plan for null correlate >>>>> >>>>> >>>>> >>>>> Best, >>>>> Danny Chan >>>>> 在 2019年3月21日 +0800 AM3:07,Julian Hyde <[email protected]>,写道: >>>>>> I just discovered that Correlate, which is neither a Join nor a >>>>> SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType. >>>>>> >>>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. >>>>>> >>>>>> Julian >>>>>> >>>>>> >>>>> >>>> >>
