I was also looking at this. Will keep you posted.

Thanks,
Walaa.

On Mon, Apr 1, 2019 at 4:53 AM Yuzhao Chen <[email protected]> wrote:

> I will take it, hope to make some help.
>
> Best,
> Danny Chan
> 在 2019年4月1日 +0800 PM7:20,Stamatis Zampetakis <[email protected]>,写道:
> > It seems that the discusion has somehow converged (at least to the major
> > points). I created CALCITE-2969, for whoever decides to tackle this
> issue.
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-2969
> >
> > Στις Δευ, 25 Μαρ 2019 στις 8:57 μ.μ., ο/η Julian Hyde <[email protected]>
> > έγραψε:
> >
> > > Generally +1 on what Haisheng says. Specifically:
> > >
> > > I like the idea of renaming EnumerableCorrelate, and making it not
> extend
> > > Correlate. I would choose EnumerableNestedLoopJoin rather than
> > > EnumerableNestLoopJoin.
> > >
> > > Shifting from LogicalCorrelate to LogicalApply is worth considering.
> > > LogicalApply is similar to the “map” operator in functional
> programming, or
> > > “selectMany” in LINQ, so is very well-behaved and powerful - a good
> > > abstraction.
> > >
> > > Regarding SemiJoin and EquiJoin. Maybe we could deprecate them, or
> maybe
> > > we could convert them to interfaces. I’ll leave that decision to
> whoever
> > > actually writes the code. If we moved a few things to interfaces
> (including
> > > JoinLike I mentioned earlier) maybe we’d get out of the gridlock
> caused by
> > > the type hierarchy.
> > >
> > > Regarding when to decorrelate. Decorrelation during sql-to-rel is
> legacy.
> > > We now prefer to decorrelate using rules, in RelNode-land. There may be
> > > bugs in the legacy decorrelation and we do not aggressively fix them.
> We
> > > can even start to remove functionality if it helps us make
> > > SqlToRelConverter simpler.
> > >
> > > Julian
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > On Mar 25, 2019, at 12:23 PM, Haisheng Yuan <[email protected]>
> > > wrote:
> > > >
> > > > I agree with Stamatis that JoinRelType should have values:
> > > > Inner, Left (Outer), Full (Outer), Semi, Anti.
> > > > The option of right outer join is not necessary, because we can flip
> the
> > > inner/outer to left outer join.
> > > >
> > > > SemiJoin and EquiJoin can be deprecated.
> > > >
> > > > EnumerableCorrelate is confusing, correlate is a logical concept,
> better
> > > to rename it to EnumerableNestLoopJoin. SemiJoin can be implemented by
> > > nestloop, hashjoin or mergejoin.
> > > > I don’t see the necessity of having a separate physical operator
> > > EnumerableSemiJoin.
> > > > But these are minor naming issue.
> > > >
> > > > Regarding the LogicalCorrelate, I view it as a kind of operator
> similar
> > > to LogicalApply [1], which is
> > > > the logical operator in Microsoft SQL Server and Greenplum Orca
> > > optimizer. Both uses LogicalApply
> > > > operator to represent the correlated join that inner has reference to
> > > the outer variable. The apply may
> > > > have different type: cross apply (or inner apply), outer apply, semi
> > > apply, anti-semi apply. They are
> > > > just subset of join types, maybe it is why it is acciociated with
> > > JoinRelType, or reuse. The main
> > > > difference between Correlate (or Apply) and Join is (logically
> > > speaking): In Correlate, inner has
> > > > reference to outer. In Join, inner doesn’t reference outer.
> NestLoopJoin
> > > can implement both.
> > > >
> > > > With optimizer transformation rules, Correlate (or Apply) can be
> > > transformed into a Join, or a Join
> > > > is transfomed into a Correlate (Or Apply), in case there is an index
> can
> > > be used in inner relation.
> > > >
> > > > What I am not comfortable with is:
> > > > In SQL Server and GPDB Orca optimzier, Sql is translated into logical
> > > relation as it should be (
> > > > keep subquery as it is), then use all kinds of apply rules to unnest
> > > subqueries based on cost model,
> > > > which seems reasonable to me.
> > > > But in Calcite, we can not only decorrelate in SqlToRel stage, but
> also
> > > can do it in SubqueryRemoveRule.
> > > > Should we unify them all in the rules and keep SqlToRelConverter
> simple?
> > > >
> > > > Thanks ~
> > > > Haisheng Yuan
> > > > ------------------------------------------------------------------
> > > > 发件人:Stamatis Zampetakis<[email protected]>
> > > > 日 期:2019年03月23日 07:31:35
> > > > 收件人:<[email protected]>
> > > > 主 题:Re: Join, SemiJoin, Correlate
> > > >
> > > > Since we are discussing this topic I thought it would be could to
> bring
> > > > back
> > > > to the surface a similar discussion [1] that has been done a few
> years
> > > ago
> > > > in this list.
> > > >
> > > > I am leaning towards option 3 where JoinRelType has all necessary
> values:
> > > > Inner, Left, Semi, Anti, and Full.
> > > > With these changes it seems we could remove (deprecate) also
> SemiJoin,
> > > and
> > > > EquiJoin.
> > > >
> > > > On the physical level we could have:
> > > > 1. EnumerableCorrelate or EnumerableNestedLoopJoin;
> > > > 2. EnumerableMergeJoin;
> > > > 3. EnumerableHashJoin (currently EnumerableJoin)
> > > >
> > > > and for the above we could pass the JoinRelType throwing an exception
> > > when
> > > > the specific algorithm cannot be used to implement a specific type of
> > > join.
> > > >
> > > > EnumerableSemiJoin and EnumerableThetaJoin could also be removed and
> > > > covered from the above I think.
> > > >
> > > > Regarding Correlate and LogicalCorrelate, I am not sure what should
> we
> > > do.
> > > > Associating the JoinRelType with it does not seem right, and making
> > > > Correlate also a Join is not very attractive either.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> > >
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB%3DJe-H7AWEHbKzjrRHd-YcZgkgWzFORALrz_mMc2k7WDdj54Q%40mail.gmail.com%3E
> > > >
> > > >
> > > > Στις Πέμ, 21 Μαρ 2019 στις 10:35 μ.μ., ο/η Walaa Eldin Moustafa <
> > > > [email protected]> έγραψε:
> > > >
> > > > > > Is your concern with how we have structured the class hierarchy?
> Or
> > > just
> > > > > how we describe Correlate in the documentation?
> > > > >
> > > > > My concern is with both, but mainly the former.
> > > > >
> > > > > > I do agree that Correlate and nested loops joins are not the
> same (one
> > > > > is logical, the other physical). However, they have a lot in
> common, in
> > > > > particular the fact that one input sets variables and the input
> reads
> > > those
> > > > > variables.
> > > > >
> > > > > I think this commonality describes how the query is written, but
> not
> > > > > necessarily what it is logically equivalent to. It also describes
> the
> > > > > "how", and not necessarily the "what". I would say logical
> > > > > representations should be concerned with the "what" part.
> > > > >
> > > > > > I can’t think of any way to represent a nested loops join (e.g.
> for
> > > each
> > > > > department, find all employees in that department) that does not
> use
> > > > > variables to tie together the two inputs. And therefore I am happy
> with
> > > the
> > > > > fact that our Java implementation of nested-loops join is ‘class
> > > > > EnumerableCorrelate extends Correlate’.
> > > > >
> > > > > That is correct. The two variables are required. At the logical
> level
> > > > > they are mapped to the Correlate variables, or the Join keys after
> > > > > decorrelation. After going to physical, we can only have join keys.
> > > > > One of the keys can be the basis for the outer loop and the other
> for
> > > > > the inner loop if needed. That is true for both Correlate and Join
> > > > > operators. Both keys can even be used in another way than forming
> > > > > nested loops such as using them to implement hash or merge joins
> > > > > (again for regular Join or Correlate join after decorrelation).
> > > > >
> > > > > Thanks,
> > > > > Walaa.
> > > > >
> > > > > On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde <[email protected]>
> wrote:
> > > > > >
> > > > > > > In addition, I would not present Correlate
> > > > > > > as a nested loops join.
> > > > > >
> > > > > >
> > > > > > Is your concern with how we have structured the class hierarchy?
> Or
> > > just
> > > > > how we describe Correlate in the documentation?
> > > > > >
> > > > > > I do agree that Correlate and nested loops joins are not the
> same (one
> > > > > is logical, the other physical). However, they have a lot in
> common, in
> > > > > particular the fact that one input sets variables and the input
> reads
> > > those
> > > > > variables.
> > > > > >
> > > > > > I can’t think of any way to represent a nested loops join (e.g.
> for
> > > each
> > > > > department, find all employees in that department) that does not
> use
> > > > > variables to tie together the two inputs. And therefore I am happy
> with
> > > the
> > > > > fact that our Java implementation of nested-loops join is ‘class
> > > > > EnumerableCorrelate extends Correlate’.
> > > > > >
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > > On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <
> > > > > [email protected]> wrote:
> > > > > > >
> > > > > > > I would vote for number 3. In addition, I would not present
> Correlate
> > > > > > > as a nested loops join. Moreover, nested loops, hash and merge
> joins
> > > > > > > should be able to map to both Join or Correlate logical ones
> when
> > > > > > > possible (no inherent correlation between logical join type and
> > > > > > > physical types).
> > > > > > >
> > > > > > > On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]
> >
> > > wrote:
> > > > > > > >
> > > > > > > > I have a few ideas for refactorings. (I’m not convinced by
> any of
> > > > > them, but let me know which you like.)
> > > > > > > >
> > > > > > > > 1. Get rid of SemiJoinType. It is mis-named (it is not used
> by
> > > > > SemiJoin, it is used by Correlate, but in a field called joinType).
> > > > > > > >
> > > > > > > > 2. In Correlate, use
> org.apache.calcite.linq4j.CorrelateJoinType. It
> > > > > has the same set of values as SemiJoinType, but it has a better
> name.
> > > > > > > >
> > > > > > > > 3. Get rid of both SemiJoinType and CorrelateJoinType, and
> use
> > > > > JoinRelType for everything. We would have to add SEMI and ANTI
> values.
> > > Also
> > > > > some methods to find out whether the resulting row type contains
> fields
> > > > > from the left and right inputs or just the left input.
> > > > > > > >
> > > > > > > > 4. Add “interface JoinLike extends BiRel” and make Join,
> SemiJoin and
> > > > > Correlate implement it. It would have a methods that say whether
> the LHS
> > > > > and RHS generate nulls, and whether the output row type contains
> columns
> > > > > from the right input. This seems attractive because it lets Join,
> > > SemiJoin
> > > > > and Correlate continue to be structurally different.
> > > > > > > >
> > > > > > > > Julian
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <
> [email protected]>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > SubPlan (in Postgres’ term) is a Postgres physical
> relational node
> > > > > to evaluate correlated subquery. What I mean is correlated
> subquery that
> > > > > can’t be decorrelated can’t be implemented by hashjoin or
> mergejoin.
> > > But it
> > > > > is off topic.
> > > > > > > > >
> > > > > > > > > Thanks ~
> > > > > > > > > Haisheng Yuan
> > > > > > > > >
> ------------------------------------------------------------------
> > > > > > > > > 发件人:Walaa Eldin Moustafa<[email protected]>
> > > > > > > > > 日 期:2019年03月21日 09:31:41
> > > > > > > > > 收件人:<[email protected]>
> > > > > > > > > 抄 送:Stamatis Zampetakis<[email protected]>
> > > > > > > > > 主 题:Re: Re: Join, SemiJoin, Correlate
> > > > > > > > >
> > > > > > > > > Agreed with Stamatis. Currently: 1) Correlate is tied to
> IN, EXISTS,
> > > > > > > > > NOT IN, NOT EXISTS, and 2) is used as an equivalent to
> nested loops
> > > > > > > > > join. The issues here are: 1) IN, EXISTS, NOT IN, NOT
> EXISTS can be
> > > > > > > > > rewritten as semi/anti joins, and 2) nested loops join is
> more of a
> > > > > > > > > physical operator.
> > > > > > > > >
> > > > > > > > > It seems that the minimal set of logical join types are
> INNER, LEFT,
> > > > > > > > > RIGHT, OUTER, SEMI, ANTI.
> > > > > > > > >
> > > > > > > > > So I think Calciate could have one LogicalJoin operator
> with an
> > > > > > > > > attribute to specify the join type (from the above), and a
> number of
> > > > > > > > > physical join operators (hash, merge, nested loops) whose
> > > > > > > > > implementation details depend on the the join type.
> > > > > > > > >
> > > > > > > > > What we lose by this model is the structure of the query
> (whether
> > > > > > > > > there was a sub-plan or not), but I would say that this is
> actually
> > > > > > > > > what is desired from a logical representation -- to
> abstract away
> > > > > from
> > > > > > > > > how the query is written, and how it is structured, as
> long as there
> > > > > > > > > is a canonical representation. There could also be a world
> where
> > > both
> > > > > > > > > models coexist (Correlate first then Decorrelate but in
> the light of
> > > > > a
> > > > > > > > > single logical join operator?).
> > > > > > > > >
> > > > > > > > > @Haisheng, generally, a sub-plan can also be implemented
> using a
> > > > > > > > > variant of hash or merge joins as long as we evaluate the
> sub-plan
> > > > > > > > > independently (without the join predicate), but that is up
> to the
> > > > > > > > > optimizer.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Walaa.
> > > > > > > > >
> > > > > > > > > On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <
> > > > > [email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > SemiJoinType and its relationship with JoinRelType do
> confuse me a
> > > > > little bit.
> > > > > > > > > >
> > > > > > > > > > But I don’t think we should not have LogicalCorrelate.
> It is useful
> > > > > to represent the lateral or correlated subquery (aka SubPlan in
> Postgres
> > > > > jargon). The LogicalCorrelate can be implemented as NestLoopJoin in
> > > > > Calcite, or SubPlan in Postgres’s terminology, but it can’t be
> > > implemented
> > > > > as HashJoin or MergeJoin.
> > > > > > > > > >
> > > > > > > > > > Thanks ~
> > > > > > > > > > Haisheng Yuan
> > > > > > > > > >
> ------------------------------------------------------------------
> > > > > > > > > > 发件人:Stamatis Zampetakis<[email protected]>
> > > > > > > > > > 日 期:2019年03月21日 07:13:15
> > > > > > > > > > 收件人:<[email protected]>
> > > > > > > > > > 主 题:Re: Join, SemiJoin, Correlate
> > > > > > > > > >
> > > > > > > > > > I have bumped into this quite a few times and I think we
> should
> > > > > really try
> > > > > > > > > > to improve the design of the join hierarchy.
> > > > > > > > > >
> > > > > > > > > > From a logical point of view I think it makes sense to
> have the
> > > > > following
> > > > > > > > > > operators:
> > > > > > > > > > InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin,
> AntiJoin,
> > > > > (GroupJoin)
> > > > > > > > > >
> > > > > > > > > > Yet I have not thought thoroughly what should become a
> class, and
> > > > > what a
> > > > > > > > > > property of the class (e.g., JoinRelType, SemiJoinType).
> > > > > > > > > >
> > > > > > > > > > Moreover, Correlate as it is right now, is basically a
> nested loop
> > > > > join (as
> > > > > > > > > > its Javadoc also indicates).
> > > > > > > > > > Nested loop join is most often encountered as a physical
> operator
> > > > > so I am
> > > > > > > > > > not sure if it should remain as is (in particular the
> > > > > LogicalCorrelate).
> > > > > > > > > > As we do not have HashJoin, MergeJoin, etc., operators
> at the
> > > > > logical
> > > > > > > > > > level, I think we should not have a NestedLoopJoin (aka.,
> > > > > LogicalCorrelate).
> > > > > > > > > > There are valid reasons why Correlate was introduced in
> the first
> > > > > place but
> > > > > > > > > > I think we should rethink a bit the design and the needs.
> > > > > > > > > >
> > > > > > > > > > @Julian: I do not know to what extend you would like to
> rethink the
> > > > > > > > > > hierarchy but I have the impression that even small
> changes can
> > > > > easily
> > > > > > > > > > break backward compatibility.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <
> > > > > [email protected]>
> > > > > > > > > > έγραψε:
> > > > > > > > > >
> > > > > > > > > > > I just discovered that Correlate, which is neither a
> Join nor a
> > > > > SemiJoin,
> > > > > > > > > > > uses SemiJoinType, but SemiJoin does not use
> SemiJoinType.
> > > > > > > > > > >
> > > > > > > > > > > Yuck. The Join/SemiJoin/Correlate type hierarchy needs
> some
> > > > > thought.
> > > > > > > > > > >
> > > > > > > > > > > Julian
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
>

Reply via email to