Hi Danny,
thanks for kicking off this big task, I have started to add some comments
in the PR

Best regards,
Ruben


Le lun. 15 avr. 2019 à 20:49, Julian Hyde <[email protected]> a écrit :

> I haven’t reviewed the change, but let’s strike a balance between
> backwards compatibility and progress. We should allow breaking changes if
> they make a significant improvement. Of course we should mark APIs
> deprecated for one or two releases, if possible.
>
> > On Apr 15, 2019, at 2:01 AM, Yuzhao Chen <[email protected]> wrote:
> >
> > I also have this question when i was doing this patch, cause Enumerable
> nodes are physical operators(implementation) and should be specific to
> Calcite. So in the beginning i didn’t classify them as public APIs and only
> keep the constructors and methods for logical nodes.
> >
> > Well this searching is somewhat convincing, there are many
> interfaces/methods are marked as deprecated and to be removed before 2.0,
> it would be better if the principles for backward compatibility are more
> clear.
> >
> > Best,
> > Danny Chan
> > 在 2019年4月14日 +0800 PM7:41,Hongze Zhang <[email protected]>,写道:
> >> I didn't take look on the PR in detail so far, but it seems that a
> topic about backward compatibility would be worth to discuss anyway.
> >>
> >> Regarding the Enumerable's API, I don't think there are many use cases
> of them from Calcite users. Although users may create instances in some
> custom rules, or extend the Enumerable rels' classes to implement some
> specific behaviors, I am still not sure if such cases are that usual.
> >>
> >> For example, I've run a Google search for term "EnumerableJoin.create"
> on github.com, only 2 results returned[1], and both are from
> apache/calcite project. Similar result on "EnumerableCorrelate.create". I
> am pretty sure that Google could not give a precise result about code usage
> (I don't find a way to search these terms using GitHub code search), but at
> least it shows some sort of trend. As a comparison there are 33 results[2]
> for "LogicalJoin.create", some are from external projects.
> >>
> >> So my question would be: how much backward compatibility should we
> respect when we make API changes to Calcite? To me it is not much clear. I
> know compatibility is very, very important for an Apache project (see "The
> Apache Project Maturity Model/QU40"[3]), but I am not sure if we should add
> "@Depracated" to any changed public staffs, the code will be messy and hard
> to understand.
> >>
> >> Anyway my example about EnumerableJoin/Correlate just shows my
> confusion on a broader topic. So I will be +1 to the consensus that already
> be achieved so far. But I'll be happy to hear more principles on how to
> manage the backward compatibility for Calcite, such as: what's the
> definition about Calcite's public API, or what changes would be considered
> backward-incompatible, etc. I think that will also benefit our developers a
> lot.
> >>
> >>
> >> Best,
> >> Hongze
> >>
> >>
> >> [1]
> https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
> >> [2]
> https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com
> >> [3]
> https://community.apache.org/apache-way/apache-project-maturity-model.html#quality
> >>
> >>> On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa <[email protected]>
> wrote:
> >>>
> >>> Agreed, but not sure what would the best way to do it be without
> >>> making the code very confusing.
> >>>
> >>> On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <[email protected]>
> wrote:
> >>>>
> >>>> I share the same concern with you.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Thanks~
> >>>> Haisheng
> Yuan------------------------------------------------------------------
> >>>> 发件人:Stamatis Zampetakis<[email protected]>
> >>>> 日 期:2019年04月14日 05:37:29
> >>>> 收件人:<[email protected]>
> >>>> 主 题:Re: Join, SemiJoin, Correlate
> >>>>
> >>>> Hi Danny,
> >>>>
> >>>> Thanks a lot for taking this on, it is a great start!
> >>>>
> >>>> I didn't look thoroughly through the PR but I noticed that there are
> many
> >>>> renaming/refactoring of public APIs. I am not sure if we should
> introduce
> >>>> so many breaking changes without prior notice. A most conservative
> approach
> >>>> would be to keep existing classes/methods, mark them as deprecated,
> and
> >>>> then remove them in one of the coming releases. I am not sure if that
> is
> >>>> the right way to go so let's see what the others have to say.
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <[email protected]>
> wrote:
> >>>>
> >>>>> Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis,
> >>>>> @Walaa Eldin Moustafa
> >>>>>
> >>>>> I have did the work for this discussion, and look forward to your
> >>>>> suggestions.
> >>>>>
> >>>>>
> >>>>> ### Diff
> >>>>> - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType,
> >>>>> EnumerableSemiJoinRule, EnumerableThetaJoin
> >>>>> - Make EnumerableMergeJoin extends Join instead of EquiJoin
> >>>>> - Add SEMI and ANTI join type to JoinRelType, add method
> >>>>> returnsJustFirstInput() to decide if the join only outputs left side
> >>>>> - Correlate use JoinRelType instead of SemiJoinType
> >>>>> - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it
> >>>>> exptends Join instead of Correlate
> >>>>> - Rename EnumerableJoin to EnumerableHashJoin
> >>>>> - EnumerableJoinRule will convert semi-join to
> EnumerableNestedLoopJoin
> >>>>> (EnumerableSemiJoin's function is merged into this rule)
> >>>>> - Add method isNonCorrelateSemiJoin() in Join.java to make sure if
> this
> >>>>> join is a semi-join (Comes from SemiJoinRule) or comes from
> >>>>> decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns
> value
> >>>>> true means the join is a semi-join equivalent to SemiJoin before
> this patch.
> >>>>> - Cache the JoinInfo in Join and use it to get leftKeys and
> rightKeys,
> >>>>> merge the SemiJoin#computeSelfCost to Join#computeSelfCost
> >>>>> - RelBuilder removes SemiJoinFactory, method #semiJoin now return a
> >>>>> LogicalJoin with JoinRelType#SEMI
> >>>>>
> >>>>> ### Rules tweak
> >>>>> - JoinAddRedundantSemiJoinRule now create LogicalJoin with
> >>>>> JoinRelType#SEMI instead of SemiJoin
> >>>>> - JoinToCorrelateRule remove SEMI instance and change the matchs
> condition
> >>>>> to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI
> >>>>> compared before this patch.
> >>>>> - SemiJoinRule match SEMI join specificlly
> >>>>>
> >>>>> ### Metadata tweak
> >>>>> - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to
> >>>>> getAllPredicates(Join) cause semi-join only outputs one side
> >>>>> - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount,
> >>>>> RelMdSize, RelMdUniqueKeys: merge semi-join logic to join
> >>>>>
> >>>>>
> >>>>> ### Test cases change
> >>>>> - MaterializationTest#testJoinMaterialization11 now can materialize
> >>>>> successfully, cause i allow logical SemiJoin node to match, the
> original
> >>>>> matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think
> is
> >>>>> wrong cause this will only matches subClasses of SemiJoin which is
> only
> >>>>> EnumerableSemiJoin before this patch.
> >>>>> - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because
> >>>>> CALCITE-2018, the final EnumerableSort's cost was cache by the
> previous
> >>>>> EnumerableSort with logical childs, so i remove the
> EnumerableSortRule and
> >>>>> the best plan is correct
> >>>>> - sub-query.iq has better plan for null correlate
> >>>>>
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Danny Chan
> >>>>> 在 2019年3月21日 +0800 AM3:07,Julian Hyde <[email protected]>,写道:
> >>>>>> I just discovered that Correlate, which is neither a Join nor a
> >>>>> SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> >>>>>>
> >>>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
>
>

Reply via email to