I didn't take look on the PR in detail so far, but it seems that a topic about 
backward compatibility would be worth to discuss anyway.

Regarding the Enumerable's API, I don't think there are many use cases of them 
from Calcite users. Although users may create instances in some custom rules, 
or extend the Enumerable rels' classes to implement some specific behaviors, I 
am still not sure if such cases are that usual.

For example, I've run a Google search for term "EnumerableJoin.create" on 
github.com, only 2 results returned[1], and both are from apache/calcite 
project. Similar result on "EnumerableCorrelate.create". I am pretty sure that 
Google could not give a precise result about code usage (I don't find a way to 
search these terms using GitHub code search), but at least it shows some sort 
of trend. As a comparison there are 33 results[2] for "LogicalJoin.create", 
some are from external projects.

So my question would be: how much backward compatibility should we respect when 
we make API changes to Calcite? To me it is not much clear. I know 
compatibility is very, very important for an Apache project (see "The Apache 
Project Maturity Model/QU40"[3]), but I am not sure if we should add 
"@Depracated" to any changed public staffs, the code will be messy and hard to 
understand.

Anyway my example about EnumerableJoin/Correlate just shows my confusion on a 
broader topic. So I will be +1 to the consensus that already be achieved so 
far. But I'll be happy to hear more principles on how to manage the backward 
compatibility for Calcite, such as: what's the definition about Calcite's 
public API, or what changes would be considered backward-incompatible, etc. I 
think that will also benefit our developers a lot.


Best,
Hongze


[1] 
https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
[2] 
https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
[3] 
https://community.apache.org/apache-way/apache-project-maturity-model.html#quality

> On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote:
> 
> Agreed, but not sure what would the best way to do it be without
> making the code very confusing.
> 
> On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <h.y...@alibaba-inc.com> wrote:
>> 
>> I share the same concern with you.
>> 
>> 
>> 
>> 
>> 
>> Thanks~
>> Haisheng 
>> Yuan------------------------------------------------------------------
>> 发件人:Stamatis Zampetakis<zabe...@gmail.com>
>> 日 期:2019年04月14日 05:37:29
>> 收件人:<dev@calcite.apache.org>
>> 主 题:Re: Join, SemiJoin, Correlate
>> 
>> Hi Danny,
>> 
>> Thanks a lot for taking this on, it is a great start!
>> 
>> I didn't look thoroughly through the PR but I noticed that there are many
>> renaming/refactoring of public APIs. I am not sure if we should introduce
>> so many breaking changes without prior notice. A most conservative approach
>> would be to keep existing classes/methods, mark them as deprecated, and
>> then remove them in one of the coming releases. I am not sure if that is
>> the right way to go so let's see what the others have to say.
>> 
>> Best,
>> Stamatis
>> 
>> On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <yuzhao....@gmail.com> wrote:
>> 
>>> Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis,
>>> @Walaa Eldin Moustafa
>>> 
>>> I have did the work for this discussion, and look forward to your
>>> suggestions.
>>> 
>>> 
>>> ### Diff
>>> - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType,
>>> EnumerableSemiJoinRule, EnumerableThetaJoin
>>> - Make EnumerableMergeJoin extends Join instead of EquiJoin
>>> - Add SEMI and ANTI join type to JoinRelType, add method
>>> returnsJustFirstInput() to decide if the join only outputs left side
>>> - Correlate use JoinRelType instead of SemiJoinType
>>> - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it
>>> exptends Join instead of Correlate
>>> - Rename EnumerableJoin to EnumerableHashJoin
>>> - EnumerableJoinRule will convert semi-join to EnumerableNestedLoopJoin
>>> (EnumerableSemiJoin's function is merged into this rule)
>>> - Add method isNonCorrelateSemiJoin() in Join.java to make sure if this
>>> join is a semi-join (Comes from SemiJoinRule) or comes from
>>> decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns value
>>> true means the join is a semi-join equivalent to SemiJoin before this patch.
>>> - Cache the JoinInfo in Join and use it to get leftKeys and rightKeys,
>>> merge the SemiJoin#computeSelfCost to Join#computeSelfCost
>>> - RelBuilder removes SemiJoinFactory, method #semiJoin now return a
>>> LogicalJoin with JoinRelType#SEMI
>>> 
>>> ### Rules tweak
>>> - JoinAddRedundantSemiJoinRule now create LogicalJoin with
>>> JoinRelType#SEMI instead of SemiJoin
>>> - JoinToCorrelateRule remove SEMI instance and change the matchs condition
>>> to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI
>>> compared before this patch.
>>> - SemiJoinRule match SEMI join specificlly
>>> 
>>> ### Metadata tweak
>>> - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to
>>> getAllPredicates(Join) cause semi-join only outputs one side
>>> - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount,
>>> RelMdSize, RelMdUniqueKeys: merge semi-join logic to join
>>> 
>>> 
>>> ### Test cases change
>>> - MaterializationTest#testJoinMaterialization11 now can materialize
>>> successfully, cause i allow logical SemiJoin node to match, the original
>>> matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think is
>>> wrong cause this will only matches subClasses of SemiJoin which is only
>>> EnumerableSemiJoin before this patch.
>>> - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because
>>> CALCITE-2018, the final EnumerableSort's cost was cache by the previous
>>> EnumerableSort with logical childs, so i remove the EnumerableSortRule and
>>> the best plan is correct
>>> - sub-query.iq has better plan for null correlate
>>> 
>>> 
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年3月21日 +0800 AM3:07,Julian Hyde <jh...@apache.org>,写道:
>>>> I just discovered that Correlate, which is neither a Join nor a
>>> SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType.
>>>> 
>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>> 
>> 

Reply via email to