It seems that the discusion has somehow converged (at least to the major
points). I created CALCITE-2969, for whoever decides to tackle this issue.

[1] https://issues.apache.org/jira/browse/CALCITE-2969

Στις Δευ, 25 Μαρ 2019 στις 8:57 μ.μ., ο/η Julian Hyde <[email protected]>
έγραψε:

> Generally +1 on what Haisheng says. Specifically:
>
> I like the idea of renaming EnumerableCorrelate, and making it not extend
> Correlate. I would choose EnumerableNestedLoopJoin rather than
> EnumerableNestLoopJoin.
>
> Shifting from LogicalCorrelate to LogicalApply is worth considering.
> LogicalApply is similar to the “map” operator in functional programming, or
> “selectMany” in LINQ, so is very well-behaved and powerful - a good
> abstraction.
>
> Regarding SemiJoin and EquiJoin. Maybe we could deprecate them, or maybe
> we could convert them to interfaces. I’ll leave that decision to whoever
> actually writes the code. If we moved a few things to interfaces (including
> JoinLike I mentioned earlier) maybe we’d get out of the gridlock caused by
> the type hierarchy.
>
> Regarding when to decorrelate. Decorrelation during sql-to-rel is legacy.
> We now prefer to decorrelate using rules, in RelNode-land. There may be
> bugs in the legacy decorrelation and we do not aggressively fix them. We
> can even start to remove functionality if it helps us make
> SqlToRelConverter simpler.
>
> Julian
>
>
>
>
>
>
>
>
> > On Mar 25, 2019, at 12:23 PM, Haisheng Yuan <[email protected]>
> wrote:
> >
> > I agree with Stamatis that JoinRelType should have values:
> > Inner, Left (Outer), Full (Outer), Semi, Anti.
> > The option of right outer join is not necessary, because we can flip the
> inner/outer to left outer join.
> >
> > SemiJoin and EquiJoin can be deprecated.
> >
> > EnumerableCorrelate is confusing, correlate is a logical concept, better
> to rename it to EnumerableNestLoopJoin. SemiJoin can be implemented by
> nestloop, hashjoin or mergejoin.
> > I don’t see the necessity of having a separate physical operator
> EnumerableSemiJoin.
> > But these are minor naming issue.
> >
> > Regarding the LogicalCorrelate, I view it as a kind of operator similar
> to LogicalApply [1], which is
> > the logical operator in Microsoft SQL Server and Greenplum Orca
> optimizer. Both uses LogicalApply
> > operator to represent the correlated join that inner has reference to
> the outer variable. The apply may
> > have different type: cross apply (or inner apply), outer apply, semi
> apply, anti-semi apply. They are
> > just subset of join types, maybe it is why it is acciociated with
> JoinRelType, or reuse. The main
> > difference between Correlate (or Apply) and Join is (logically
> speaking): In Correlate, inner has
> > reference to outer. In Join, inner doesn’t reference outer. NestLoopJoin
> can implement both.
> >
> > With optimizer transformation rules, Correlate (or Apply) can be
> transformed into a Join, or a Join
> > is transfomed into a Correlate (Or Apply), in case there is an index can
> be used in inner relation.
> >
> > What I am not comfortable with is:
> > In SQL Server and GPDB Orca optimzier, Sql is translated into logical
> relation as it should be (
> > keep subquery as it is), then use all kinds of apply rules to unnest
> subqueries based on cost model,
> > which seems reasonable to me.
> > But in Calcite, we can not only decorrelate in SqlToRel stage, but also
> can do it in SubqueryRemoveRule.
> > Should we unify them all in the rules and keep SqlToRelConverter simple?
> >
> > Thanks ~
> > Haisheng Yuan
> > ------------------------------------------------------------------
> > 发件人:Stamatis Zampetakis<[email protected]>
> > 日 期:2019年03月23日 07:31:35
> > 收件人:<[email protected]>
> > 主 题:Re: Join, SemiJoin, Correlate
> >
> > Since we are discussing this topic I thought it would be could to bring
> > back
> > to the surface a similar discussion [1] that has been done a few years
> ago
> > in this list.
> >
> > I am leaning towards option 3 where JoinRelType has all necessary values:
> > Inner, Left, Semi, Anti, and Full.
> > With these changes it seems we could remove (deprecate) also SemiJoin,
> and
> > EquiJoin.
> >
> > On the physical level we could have:
> > 1. EnumerableCorrelate or EnumerableNestedLoopJoin;
> > 2. EnumerableMergeJoin;
> > 3. EnumerableHashJoin (currently EnumerableJoin)
> >
> > and for the above we could pass the JoinRelType throwing an exception
> when
> > the specific algorithm cannot be used to implement a specific type of
> join.
> >
> > EnumerableSemiJoin and EnumerableThetaJoin could also be removed and
> > covered from the above I think.
> >
> > Regarding Correlate and LogicalCorrelate, I am not sure what should we
> do.
> > Associating the JoinRelType with it does not seem right, and making
> > Correlate also a Join is not very attractive either.
> >
> > Best,
> > Stamatis
> >
> > [1]
> >
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB%3DJe-H7AWEHbKzjrRHd-YcZgkgWzFORALrz_mMc2k7WDdj54Q%40mail.gmail.com%3E
> >
> >
> > Στις Πέμ, 21 Μαρ 2019 στις 10:35 μ.μ., ο/η Walaa Eldin Moustafa <
> > [email protected]> έγραψε:
> >
> >>> Is your concern with how we have structured the class hierarchy? Or
> just
> >> how we describe Correlate in the documentation?
> >>
> >> My concern is with both, but mainly the former.
> >>
> >>> I do agree that Correlate and nested loops joins are not the same (one
> >> is logical, the other physical). However, they have a lot in common, in
> >> particular the fact that one input sets variables and the input reads
> those
> >> variables.
> >>
> >> I think this commonality describes how the query is written, but not
> >> necessarily what it is logically equivalent to. It also describes the
> >> "how", and not necessarily the "what". I would say logical
> >> representations should be concerned with the "what" part.
> >>
> >>> I can’t think of any way to represent a nested loops join (e.g. for
> each
> >> department, find all employees in that department) that does not use
> >> variables to tie together the two inputs. And therefore I am happy with
> the
> >> fact that our Java implementation of nested-loops join is ‘class
> >> EnumerableCorrelate extends Correlate’.
> >>
> >> That is correct. The two variables are required. At the logical level
> >> they are mapped to the Correlate variables, or the Join keys after
> >> decorrelation. After going to physical, we can only have join keys.
> >> One of the keys can be the basis for the outer loop and the other for
> >> the inner loop if needed. That is true for both Correlate and Join
> >> operators. Both keys can even be used in another way than forming
> >> nested loops such as using them to implement hash or merge joins
> >> (again for regular Join or Correlate join after decorrelation).
> >>
> >> Thanks,
> >> Walaa.
> >>
> >> On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde <[email protected]> wrote:
> >>>
> >>>> In addition, I would not present Correlate
> >>>> as a nested loops join.
> >>>
> >>>
> >>> Is your concern with how we have structured the class hierarchy? Or
> just
> >> how we describe Correlate in the documentation?
> >>>
> >>> I do agree that Correlate and nested loops joins are not the same (one
> >> is logical, the other physical). However, they have a lot in common, in
> >> particular the fact that one input sets variables and the input reads
> those
> >> variables.
> >>>
> >>> I can’t think of any way to represent a nested loops join (e.g. for
> each
> >> department, find all employees in that department) that does not use
> >> variables to tie together the two inputs. And therefore I am happy with
> the
> >> fact that our Java implementation of nested-loops join is ‘class
> >> EnumerableCorrelate extends Correlate’.
> >>>
> >>>
> >>> Julian
> >>>
> >>>> On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <
> >> [email protected]> wrote:
> >>>>
> >>>> I would vote for number 3. In addition, I would not present Correlate
> >>>> as a nested loops join. Moreover, nested loops, hash and merge joins
> >>>> should be able to map to both Join or Correlate logical ones when
> >>>> possible (no inherent correlation between logical join type and
> >>>> physical types).
> >>>>
> >>>> On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]>
> wrote:
> >>>>>
> >>>>> I have a few ideas for refactorings. (I’m not convinced by any of
> >> them, but let me know which you like.)
> >>>>>
> >>>>> 1. Get rid of SemiJoinType. It is mis-named (it is not used by
> >> SemiJoin, it is used by Correlate, but in a field called joinType).
> >>>>>
> >>>>> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It
> >> has the same set of values as SemiJoinType, but it has a better name.
> >>>>>
> >>>>> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use
> >> JoinRelType for everything. We would have to add SEMI and ANTI values.
> Also
> >> some methods to find out whether the resulting row type contains fields
> >> from the left and right inputs or just the left input.
> >>>>>
> >>>>> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and
> >> Correlate implement it. It would have a methods that say whether the LHS
> >> and RHS generate nulls, and whether the output row type contains columns
> >> from the right input. This seems attractive because it lets Join,
> SemiJoin
> >> and Correlate continue to be structurally different.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]>
> >> wrote:
> >>>>>>
> >>>>>> SubPlan (in Postgres’ term) is a Postgres physical relational node
> >> to evaluate correlated subquery. What I mean is correlated subquery that
> >> can’t be decorrelated can’t be implemented by hashjoin or mergejoin.
> But it
> >> is off topic.
> >>>>>>
> >>>>>> Thanks ~
> >>>>>> Haisheng Yuan
> >>>>>> ------------------------------------------------------------------
> >>>>>> 发件人:Walaa Eldin Moustafa<[email protected]>
> >>>>>> 日 期:2019年03月21日 09:31:41
> >>>>>> 收件人:<[email protected]>
> >>>>>> 抄 送:Stamatis Zampetakis<[email protected]>
> >>>>>> 主 题:Re: Re: Join, SemiJoin, Correlate
> >>>>>>
> >>>>>> Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
> >>>>>> NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
> >>>>>> join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
> >>>>>> rewritten as semi/anti joins, and 2) nested loops join is more of a
> >>>>>> physical operator.
> >>>>>>
> >>>>>> It seems that the minimal set of logical join types are INNER, LEFT,
> >>>>>> RIGHT, OUTER, SEMI, ANTI.
> >>>>>>
> >>>>>> So I think Calciate could have one LogicalJoin operator with an
> >>>>>> attribute to specify the join type (from the above), and a number of
> >>>>>> physical join operators (hash, merge, nested loops) whose
> >>>>>> implementation details depend on the the join type.
> >>>>>>
> >>>>>> What we lose by this model is the structure of the query (whether
> >>>>>> there was a sub-plan or not), but I would say that this is actually
> >>>>>> what is desired from a logical representation -- to abstract away
> >> from
> >>>>>> how the query is written, and how it is structured, as long as there
> >>>>>> is a canonical representation. There could also be a world where
> both
> >>>>>> models coexist (Correlate first then Decorrelate but in the light of
> >> a
> >>>>>> single logical join operator?).
> >>>>>>
> >>>>>> @Haisheng, generally, a sub-plan can also be implemented using a
> >>>>>> variant of hash or merge joins as long as we evaluate the sub-plan
> >>>>>> independently (without the join predicate), but that is up to the
> >>>>>> optimizer.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Walaa.
> >>>>>>
> >>>>>> On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <
> >> [email protected]> wrote:
> >>>>>>>
> >>>>>>> SemiJoinType and its relationship with JoinRelType do confuse me a
> >> little bit.
> >>>>>>>
> >>>>>>> But I don’t think we should not have LogicalCorrelate. It is useful
> >> to represent the lateral or correlated subquery (aka SubPlan in Postgres
> >> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in
> >> Calcite, or SubPlan in Postgres’s terminology, but it can’t be
> implemented
> >> as HashJoin or MergeJoin.
> >>>>>>>
> >>>>>>> Thanks ~
> >>>>>>> Haisheng Yuan
> >>>>>>> ------------------------------------------------------------------
> >>>>>>> 发件人:Stamatis Zampetakis<[email protected]>
> >>>>>>> 日 期:2019年03月21日 07:13:15
> >>>>>>> 收件人:<[email protected]>
> >>>>>>> 主 题:Re: Join, SemiJoin, Correlate
> >>>>>>>
> >>>>>>> I have bumped into this quite a few times and I think we should
> >> really try
> >>>>>>> to improve the design of the join hierarchy.
> >>>>>>>
> >>>>>>> From a logical point of view I think it makes sense to have the
> >> following
> >>>>>>> operators:
> >>>>>>> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin,
> >> (GroupJoin)
> >>>>>>>
> >>>>>>> Yet I have not thought thoroughly what should become a class, and
> >> what a
> >>>>>>> property of the class (e.g., JoinRelType, SemiJoinType).
> >>>>>>>
> >>>>>>> Moreover, Correlate as it is right now, is basically a nested loop
> >> join (as
> >>>>>>> its Javadoc also indicates).
> >>>>>>> Nested loop join is most often encountered as a physical operator
> >> so I am
> >>>>>>> not sure if it should remain as is (in particular the
> >> LogicalCorrelate).
> >>>>>>> As we do not have HashJoin, MergeJoin, etc., operators at the
> >> logical
> >>>>>>> level, I think we should not have a NestedLoopJoin (aka.,
> >> LogicalCorrelate).
> >>>>>>> There are valid reasons why Correlate was introduced in the first
> >> place but
> >>>>>>> I think we should rethink a bit the design and the needs.
> >>>>>>>
> >>>>>>> @Julian: I do not know to what extend you would like to rethink the
> >>>>>>> hierarchy but I have the impression that even small changes can
> >> easily
> >>>>>>> break backward compatibility.
> >>>>>>>
> >>>>>>>
> >>>>>>> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <
> >> [email protected]>
> >>>>>>> έγραψε:
> >>>>>>>
> >>>>>>>> I just discovered that Correlate, which is neither a Join nor a
> >> SemiJoin,
> >>>>>>>> uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> >>>>>>>>
> >>>>>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some
> >> thought.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> >
>
>

Reply via email to