I agree with Stamatis that JoinRelType should have values:
Inner, Left (Outer), Full (Outer), Semi, Anti.
The option of right outer join is not necessary, because we can flip the 
inner/outer to left outer join.

SemiJoin and EquiJoin can be deprecated.

EnumerableCorrelate is confusing, correlate is a logical concept, better to 
rename it to EnumerableNestLoopJoin. SemiJoin can be implemented by nestloop, 
hashjoin or mergejoin.
I don’t see the necessity of having a separate physical operator 
EnumerableSemiJoin. 
But these are minor naming issue.

Regarding the LogicalCorrelate, I view it as a kind of operator similar to 
LogicalApply [1], which is
the logical operator in Microsoft SQL Server and Greenplum Orca optimizer. Both 
uses LogicalApply
operator to represent the correlated join that inner has reference to the outer 
variable. The apply may
have different type: cross apply (or inner apply), outer apply, semi apply, 
anti-semi apply. They are
just subset of join types, maybe it is why it is acciociated with JoinRelType, 
or reuse. The main
difference between Correlate (or Apply) and Join is (logically speaking): In 
Correlate, inner has 
reference to outer. In Join, inner doesn’t reference outer. NestLoopJoin can 
implement both.

With optimizer transformation rules, Correlate (or Apply) can be transformed 
into a Join, or a Join
is transfomed into a Correlate (Or Apply), in case there is an index can be 
used in inner relation.

What I am not comfortable with is:
In SQL Server and GPDB Orca optimzier, Sql is translated into logical relation 
as it should be (
keep subquery as it is), then use all kinds of apply rules to unnest subqueries 
based on cost model,
which seems reasonable to me.
But in Calcite, we can not only decorrelate in SqlToRel stage, but also can do 
it in SubqueryRemoveRule.
Should we unify them all in the rules and keep SqlToRelConverter simple?

Thanks ~
Haisheng Yuan
------------------------------------------------------------------
发件人:Stamatis Zampetakis<[email protected]>
日 期:2019年03月23日 07:31:35
收件人:<[email protected]>
主 题:Re: Join, SemiJoin, Correlate

Since we are discussing this topic I thought it would be could to bring
back
to the surface a similar discussion [1] that has been done a few years ago
in this list.

I am leaning towards option 3 where JoinRelType has all necessary values:
Inner, Left, Semi, Anti, and Full.
With these changes it seems we could remove (deprecate) also SemiJoin, and
EquiJoin.

On the physical level we could have:
1. EnumerableCorrelate or EnumerableNestedLoopJoin;
2. EnumerableMergeJoin;
3. EnumerableHashJoin (currently EnumerableJoin)

and for the above we could pass the JoinRelType throwing an exception when
the specific algorithm cannot be used to implement a specific type of join.

EnumerableSemiJoin and EnumerableThetaJoin could also be removed and
covered from the above I think.

Regarding Correlate and LogicalCorrelate, I am not sure what should we do.
Associating the JoinRelType with it does not seem right, and making
Correlate also a Join is not very attractive either.

Best,
Stamatis

[1]
http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB%3DJe-H7AWEHbKzjrRHd-YcZgkgWzFORALrz_mMc2k7WDdj54Q%40mail.gmail.com%3E


Στις Πέμ, 21 Μαρ 2019 στις 10:35 μ.μ., ο/η Walaa Eldin Moustafa <
[email protected]> έγραψε:

> > Is your concern with how we have structured the class hierarchy? Or just
> how we describe Correlate in the documentation?
>
> My concern is with both, but mainly the former.
>
> > I do agree that Correlate and nested loops joins are not the same (one
> is logical, the other physical). However, they have a lot in common, in
> particular the fact that one input sets variables and the input reads those
> variables.
>
> I think this commonality describes how the query is written, but not
> necessarily what it is logically equivalent to. It also describes the
> "how", and not necessarily the "what". I would say logical
> representations should be concerned with the "what" part.
>
> > I can’t think of any way to represent a nested loops join (e.g. for each
> department, find all employees in that department) that does not use
> variables to tie together the two inputs. And therefore I am happy with the
> fact that our Java implementation of nested-loops join is ‘class
> EnumerableCorrelate extends Correlate’.
>
> That is correct. The two variables are required. At the logical level
> they are mapped to the Correlate variables, or the Join keys after
> decorrelation. After going to physical, we can only have join keys.
> One of the keys can be the basis for the outer loop and the other for
> the inner loop if needed. That is true for both Correlate and Join
> operators. Both keys can even be used in another way than forming
> nested loops such as using them to implement hash or merge joins
> (again for regular Join or Correlate join after decorrelation).
>
> Thanks,
> Walaa.
>
> On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde <[email protected]> wrote:
> >
> > > In addition, I would not present Correlate
> > > as a nested loops join.
> >
> >
> > Is your concern with how we have structured the class hierarchy? Or just
> how we describe Correlate in the documentation?
> >
> > I do agree that Correlate and nested loops joins are not the same (one
> is logical, the other physical). However, they have a lot in common, in
> particular the fact that one input sets variables and the input reads those
> variables.
> >
> > I can’t think of any way to represent a nested loops join (e.g. for each
> department, find all employees in that department) that does not use
> variables to tie together the two inputs. And therefore I am happy with the
> fact that our Java implementation of nested-loops join is ‘class
> EnumerableCorrelate extends Correlate’.
> >
> >
> > Julian
> >
> > > On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <
> [email protected]> wrote:
> > >
> > > I would vote for number 3. In addition, I would not present Correlate
> > > as a nested loops join. Moreover, nested loops, hash and merge joins
> > > should be able to map to both Join or Correlate logical ones when
> > > possible (no inherent correlation between logical join type and
> > > physical types).
> > >
> > > On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]> wrote:
> > >>
> > >> I have a few ideas for refactorings. (I’m not convinced by any of
> them, but let me know which you like.)
> > >>
> > >> 1. Get rid of SemiJoinType. It is mis-named (it is not used by
> SemiJoin, it is used by Correlate, but in a field called joinType).
> > >>
> > >> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It
> has the same set of values as SemiJoinType, but it has a better name.
> > >>
> > >> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use
> JoinRelType for everything. We would have to add SEMI and ANTI values. Also
> some methods to find out whether the resulting row type contains fields
> from the left and right inputs or just the left input.
> > >>
> > >> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and
> Correlate implement it. It would have a methods that say whether the LHS
> and RHS generate nulls, and whether the output row type contains columns
> from the right input. This seems attractive because it lets Join, SemiJoin
> and Correlate continue to be structurally different.
> > >>
> > >> Julian
> > >>
> > >>
> > >>
> > >>
> > >>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]>
> wrote:
> > >>>
> > >>> SubPlan (in Postgres’ term) is a Postgres physical relational node
> to evaluate correlated subquery. What I mean is correlated subquery that
> can’t be decorrelated can’t be implemented by hashjoin or mergejoin. But it
> is off topic.
> > >>>
> > >>> Thanks ~
> > >>> Haisheng Yuan
> > >>> ------------------------------------------------------------------
> > >>> 发件人:Walaa Eldin Moustafa<[email protected]>
> > >>> 日 期:2019年03月21日 09:31:41
> > >>> 收件人:<[email protected]>
> > >>> 抄 送:Stamatis Zampetakis<[email protected]>
> > >>> 主 题:Re: Re: Join, SemiJoin, Correlate
> > >>>
> > >>> Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
> > >>> NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
> > >>> join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
> > >>> rewritten as semi/anti joins, and 2) nested loops join is more of a
> > >>> physical operator.
> > >>>
> > >>> It seems that the minimal set of logical join types are INNER, LEFT,
> > >>> RIGHT, OUTER, SEMI, ANTI.
> > >>>
> > >>> So I think Calciate could have one LogicalJoin operator with an
> > >>> attribute to specify the join type (from the above), and a number of
> > >>> physical join operators (hash, merge, nested loops) whose
> > >>> implementation details depend on the the join type.
> > >>>
> > >>> What we lose by this model is the structure of the query (whether
> > >>> there was a sub-plan or not), but I would say that this is actually
> > >>> what is desired from a logical representation -- to abstract away
> from
> > >>> how the query is written, and how it is structured, as long as there
> > >>> is a canonical representation. There could also be a world where both
> > >>> models coexist (Correlate first then Decorrelate but in the light of
> a
> > >>> single logical join operator?).
> > >>>
> > >>> @Haisheng, generally, a sub-plan can also be implemented using a
> > >>> variant of hash or merge joins as long as we evaluate the sub-plan
> > >>> independently (without the join predicate), but that is up to the
> > >>> optimizer.
> > >>>
> > >>> Thanks,
> > >>> Walaa.
> > >>>
> > >>> On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <
> [email protected]> wrote:
> > >>>>
> > >>>> SemiJoinType and its relationship with JoinRelType do confuse me a
> little bit.
> > >>>>
> > >>>> But I don’t think we should not have LogicalCorrelate. It is useful
> to represent the lateral or correlated subquery (aka SubPlan in Postgres
> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in
> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented
> as HashJoin or MergeJoin.
> > >>>>
> > >>>> Thanks ~
> > >>>> Haisheng Yuan
> > >>>> ------------------------------------------------------------------
> > >>>> 发件人:Stamatis Zampetakis<[email protected]>
> > >>>> 日 期:2019年03月21日 07:13:15
> > >>>> 收件人:<[email protected]>
> > >>>> 主 题:Re: Join, SemiJoin, Correlate
> > >>>>
> > >>>> I have bumped into this quite a few times and I think we should
> really try
> > >>>> to improve the design of the join hierarchy.
> > >>>>
> > >>>> From a logical point of view I think it makes sense to have the
> following
> > >>>> operators:
> > >>>> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin,
> (GroupJoin)
> > >>>>
> > >>>> Yet I have not thought thoroughly what should become a class, and
> what a
> > >>>> property of the class (e.g., JoinRelType, SemiJoinType).
> > >>>>
> > >>>> Moreover, Correlate as it is right now, is basically a nested loop
> join (as
> > >>>> its Javadoc also indicates).
> > >>>> Nested loop join is most often encountered as a physical operator
> so I am
> > >>>> not sure if it should remain as is (in particular the
> LogicalCorrelate).
> > >>>> As we do not have HashJoin, MergeJoin, etc., operators at the
> logical
> > >>>> level, I think we should not have a NestedLoopJoin (aka.,
> LogicalCorrelate).
> > >>>> There are valid reasons why Correlate was introduced in the first
> place but
> > >>>> I think we should rethink a bit the design and the needs.
> > >>>>
> > >>>> @Julian: I do not know to what extend you would like to rethink the
> > >>>> hierarchy but I have the impression that even small changes can
> easily
> > >>>> break backward compatibility.
> > >>>>
> > >>>>
> > >>>> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <
> [email protected]>
> > >>>> έγραψε:
> > >>>>
> > >>>>> I just discovered that Correlate, which is neither a Join nor a
> SemiJoin,
> > >>>>> uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> > >>>>>
> > >>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some
> thought.
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
>

Reply via email to