I haven’t reviewed the change, but let’s strike a balance between backwards 
compatibility and progress. We should allow breaking changes if they make a 
significant improvement. Of course we should mark APIs deprecated for one or 
two releases, if possible. 

> On Apr 15, 2019, at 2:01 AM, Yuzhao Chen <[email protected]> wrote:
> 
> I also have this question when i was doing this patch, cause Enumerable nodes 
> are physical operators(implementation) and should be specific to Calcite. So 
> in the beginning i didn’t classify them as public APIs and only keep the 
> constructors and methods for logical nodes.
> 
> Well this searching is somewhat convincing, there are many interfaces/methods 
> are marked as deprecated and to be removed before 2.0, it would be better if 
> the principles for backward compatibility are more clear.
> 
> Best,
> Danny Chan
> 在 2019年4月14日 +0800 PM7:41,Hongze Zhang <[email protected]>,写道:
>> I didn't take look on the PR in detail so far, but it seems that a topic 
>> about backward compatibility would be worth to discuss anyway.
>> 
>> Regarding the Enumerable's API, I don't think there are many use cases of 
>> them from Calcite users. Although users may create instances in some custom 
>> rules, or extend the Enumerable rels' classes to implement some specific 
>> behaviors, I am still not sure if such cases are that usual.
>> 
>> For example, I've run a Google search for term "EnumerableJoin.create" on 
>> github.com, only 2 results returned[1], and both are from apache/calcite 
>> project. Similar result on "EnumerableCorrelate.create". I am pretty sure 
>> that Google could not give a precise result about code usage (I don't find a 
>> way to search these terms using GitHub code search), but at least it shows 
>> some sort of trend. As a comparison there are 33 results[2] for 
>> "LogicalJoin.create", some are from external projects.
>> 
>> So my question would be: how much backward compatibility should we respect 
>> when we make API changes to Calcite? To me it is not much clear. I know 
>> compatibility is very, very important for an Apache project (see "The Apache 
>> Project Maturity Model/QU40"[3]), but I am not sure if we should add 
>> "@Depracated" to any changed public staffs, the code will be messy and hard 
>> to understand.
>> 
>> Anyway my example about EnumerableJoin/Correlate just shows my confusion on 
>> a broader topic. So I will be +1 to the consensus that already be achieved 
>> so far. But I'll be happy to hear more principles on how to manage the 
>> backward compatibility for Calcite, such as: what's the definition about 
>> Calcite's public API, or what changes would be considered 
>> backward-incompatible, etc. I think that will also benefit our developers a 
>> lot.
>> 
>> 
>> Best,
>> Hongze
>> 
>> 
>> [1] 
>> https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
>> [2] 
>> https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
>> [3] 
>> https://community.apache.org/apache-way/apache-project-maturity-model.html#quality
>> 
>>> On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa <[email protected]> 
>>> wrote:
>>> 
>>> Agreed, but not sure what would the best way to do it be without
>>> making the code very confusing.
>>> 
>>> On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <[email protected]> 
>>> wrote:
>>>> 
>>>> I share the same concern with you.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Thanks~
>>>> Haisheng 
>>>> Yuan------------------------------------------------------------------
>>>> 发件人:Stamatis Zampetakis<[email protected]>
>>>> 日 期:2019年04月14日 05:37:29
>>>> 收件人:<[email protected]>
>>>> 主 题:Re: Join, SemiJoin, Correlate
>>>> 
>>>> Hi Danny,
>>>> 
>>>> Thanks a lot for taking this on, it is a great start!
>>>> 
>>>> I didn't look thoroughly through the PR but I noticed that there are many
>>>> renaming/refactoring of public APIs. I am not sure if we should introduce
>>>> so many breaking changes without prior notice. A most conservative approach
>>>> would be to keep existing classes/methods, mark them as deprecated, and
>>>> then remove them in one of the coming releases. I am not sure if that is
>>>> the right way to go so let's see what the others have to say.
>>>> 
>>>> Best,
>>>> Stamatis
>>>> 
>>>> On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <[email protected]> wrote:
>>>> 
>>>>> Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis,
>>>>> @Walaa Eldin Moustafa
>>>>> 
>>>>> I have did the work for this discussion, and look forward to your
>>>>> suggestions.
>>>>> 
>>>>> 
>>>>> ### Diff
>>>>> - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType,
>>>>> EnumerableSemiJoinRule, EnumerableThetaJoin
>>>>> - Make EnumerableMergeJoin extends Join instead of EquiJoin
>>>>> - Add SEMI and ANTI join type to JoinRelType, add method
>>>>> returnsJustFirstInput() to decide if the join only outputs left side
>>>>> - Correlate use JoinRelType instead of SemiJoinType
>>>>> - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it
>>>>> exptends Join instead of Correlate
>>>>> - Rename EnumerableJoin to EnumerableHashJoin
>>>>> - EnumerableJoinRule will convert semi-join to EnumerableNestedLoopJoin
>>>>> (EnumerableSemiJoin's function is merged into this rule)
>>>>> - Add method isNonCorrelateSemiJoin() in Join.java to make sure if this
>>>>> join is a semi-join (Comes from SemiJoinRule) or comes from
>>>>> decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns value
>>>>> true means the join is a semi-join equivalent to SemiJoin before this 
>>>>> patch.
>>>>> - Cache the JoinInfo in Join and use it to get leftKeys and rightKeys,
>>>>> merge the SemiJoin#computeSelfCost to Join#computeSelfCost
>>>>> - RelBuilder removes SemiJoinFactory, method #semiJoin now return a
>>>>> LogicalJoin with JoinRelType#SEMI
>>>>> 
>>>>> ### Rules tweak
>>>>> - JoinAddRedundantSemiJoinRule now create LogicalJoin with
>>>>> JoinRelType#SEMI instead of SemiJoin
>>>>> - JoinToCorrelateRule remove SEMI instance and change the matchs condition
>>>>> to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI
>>>>> compared before this patch.
>>>>> - SemiJoinRule match SEMI join specificlly
>>>>> 
>>>>> ### Metadata tweak
>>>>> - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to
>>>>> getAllPredicates(Join) cause semi-join only outputs one side
>>>>> - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount,
>>>>> RelMdSize, RelMdUniqueKeys: merge semi-join logic to join
>>>>> 
>>>>> 
>>>>> ### Test cases change
>>>>> - MaterializationTest#testJoinMaterialization11 now can materialize
>>>>> successfully, cause i allow logical SemiJoin node to match, the original
>>>>> matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think is
>>>>> wrong cause this will only matches subClasses of SemiJoin which is only
>>>>> EnumerableSemiJoin before this patch.
>>>>> - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because
>>>>> CALCITE-2018, the final EnumerableSort's cost was cache by the previous
>>>>> EnumerableSort with logical childs, so i remove the EnumerableSortRule and
>>>>> the best plan is correct
>>>>> - sub-query.iq has better plan for null correlate
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Danny Chan
>>>>> 在 2019年3月21日 +0800 AM3:07,Julian Hyde <[email protected]>,写道:
>>>>>> I just discovered that Correlate, which is neither a Join nor a
>>>>> SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType.
>>>>>> 
>>>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 

Reply via email to