Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
rewritten as semi/anti joins, and 2) nested loops join is more of a
physical operator.

It seems that the minimal set of logical join types are INNER, LEFT,
RIGHT, OUTER, SEMI, ANTI.

So I think Calciate could have one LogicalJoin operator with an
attribute to specify the join type (from the above), and a number of
physical join operators (hash, merge, nested loops) whose
implementation details depend on the the join type.

What we lose by this model is the structure of the query (whether
there was a sub-plan or not), but I would say that this is actually
what is desired from a logical representation -- to abstract away from
how the query is written, and how it is structured, as long as there
is a canonical representation. There could also be a world where both
models coexist (Correlate first then Decorrelate but in the light of a
single logical join operator?).

@Haisheng, generally, a sub-plan can also be implemented using a
variant of hash or merge joins as long as we evaluate the sub-plan
independently (without the join predicate), but that is up to the
optimizer.

Thanks,
Walaa.

On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <[email protected]> wrote:
>
> SemiJoinType and its relationship with JoinRelType do confuse me a little bit.
>
> But I don’t think we should not have LogicalCorrelate. It is useful to 
> represent the lateral or correlated subquery (aka SubPlan in Postgres 
> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in Calcite, 
> or SubPlan in Postgres’s terminology, but it can’t be implemented as HashJoin 
> or MergeJoin.
>
> Thanks ~
> Haisheng Yuan
> ------------------------------------------------------------------
> 发件人:Stamatis Zampetakis<[email protected]>
> 日 期:2019年03月21日 07:13:15
> 收件人:<[email protected]>
> 主 题:Re: Join, SemiJoin, Correlate
>
> I have bumped into this quite a few times and I think we should really try
> to improve the design of the join hierarchy.
>
> From a logical point of view I think it makes sense to have the following
> operators:
> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin)
>
> Yet I have not thought thoroughly what should become a class, and what a
> property of the class (e.g., JoinRelType, SemiJoinType).
>
> Moreover, Correlate as it is right now, is basically a nested loop join (as
> its Javadoc also indicates).
> Nested loop join is most often encountered as a physical operator so I am
> not sure if it should remain as is (in particular the LogicalCorrelate).
> As we do not have HashJoin, MergeJoin, etc., operators at the logical
> level, I think we should not have a NestedLoopJoin (aka., LogicalCorrelate).
> There are valid reasons why Correlate was introduced in the first place but
> I think we should rethink a bit the design and the needs.
>
> @Julian: I do not know to what extend you would like to rethink the
> hierarchy but I have the impression that even small changes can easily
> break backward compatibility.
>
>
> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]>
> έγραψε:
>
> > I just discovered that Correlate, which is neither a Join nor a SemiJoin,
> > uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> >
> > Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
> >
> > Julian
> >
> >
> >
>

Reply via email to