I kind of prefer 3, for the reasons below:

1. It makes the concept unified and clean.
2. We somehow can not remove the Correlate, cause it's a good logical
representation for correlated query, it can be de-correlated to a HashJoin
or keep as a NestedLoopJoin, for functionality, it seens like Hive
QBSubQuery[1], and Hive uses JoinType[2] to enumerate different joins. An
enumeration have much cleaner concept than just functional methods.

[1]
https://github.com/apache/hive/blob/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java#L41
[2]
https://github.com/apache/hive/blob/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53/ql/src/java/org/apache/hadoop/hive/ql/parse/JoinType.java#L26

Best,
Danny Chan

Julian Hyde <[email protected]> 于2019年3月22日周五 上午2:55写道:

> I have a few ideas for refactorings. (I’m not convinced by any of them,
> but let me know which you like.)
>
> 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin,
> it is used by Correlate, but in a field called joinType).
>
> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has
> the same set of values as SemiJoinType, but it has a better name.
>
> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType
> for everything. We would have to add SEMI and ANTI values. Also some
> methods to find out whether the resulting row type contains fields from the
> left and right inputs or just the left input.
>
> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and
> Correlate implement it. It would have a methods that say whether the LHS
> and RHS generate nulls, and whether the output row type contains columns
> from the right input. This seems attractive because it lets Join, SemiJoin
> and Correlate continue to be structurally different.
>
> Julian
>
>
>
>
> > On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]>
> wrote:
> >
> > SubPlan (in Postgres’ term) is a Postgres physical relational node to
> evaluate correlated subquery. What I mean is correlated subquery that can’t
> be decorrelated can’t be implemented by hashjoin or mergejoin. But it is
> off topic.
> >
> > Thanks ~
> > Haisheng Yuan
> > ------------------------------------------------------------------
> > 发件人:Walaa Eldin Moustafa<[email protected]>
> > 日 期:2019年03月21日 09:31:41
> > 收件人:<[email protected]>
> > 抄 送:Stamatis Zampetakis<[email protected]>
> > 主 题:Re: Re: Join, SemiJoin, Correlate
> >
> > Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
> > NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
> > join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
> > rewritten as semi/anti joins, and 2) nested loops join is more of a
> > physical operator.
> >
> > It seems that the minimal set of logical join types are INNER, LEFT,
> > RIGHT, OUTER, SEMI, ANTI.
> >
> > So I think Calciate could have one LogicalJoin operator with an
> > attribute to specify the join type (from the above), and a number of
> > physical join operators (hash, merge, nested loops) whose
> > implementation details depend on the the join type.
> >
> > What we lose by this model is the structure of the query (whether
> > there was a sub-plan or not), but I would say that this is actually
> > what is desired from a logical representation -- to abstract away from
> > how the query is written, and how it is structured, as long as there
> > is a canonical representation. There could also be a world where both
> > models coexist (Correlate first then Decorrelate but in the light of a
> > single logical join operator?).
> >
> > @Haisheng, generally, a sub-plan can also be implemented using a
> > variant of hash or merge joins as long as we evaluate the sub-plan
> > independently (without the join predicate), but that is up to the
> > optimizer.
> >
> > Thanks,
> > Walaa.
> >
> > On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <[email protected]>
> wrote:
> >>
> >> SemiJoinType and its relationship with JoinRelType do confuse me a
> little bit.
> >>
> >> But I don’t think we should not have LogicalCorrelate. It is useful to
> represent the lateral or correlated subquery (aka SubPlan in Postgres
> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in
> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented
> as HashJoin or MergeJoin.
> >>
> >> Thanks ~
> >> Haisheng Yuan
> >> ------------------------------------------------------------------
> >> 发件人:Stamatis Zampetakis<[email protected]>
> >> 日 期:2019年03月21日 07:13:15
> >> 收件人:<[email protected]>
> >> 主 题:Re: Join, SemiJoin, Correlate
> >>
> >> I have bumped into this quite a few times and I think we should really
> try
> >> to improve the design of the join hierarchy.
> >>
> >> From a logical point of view I think it makes sense to have the
> following
> >> operators:
> >> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin)
> >>
> >> Yet I have not thought thoroughly what should become a class, and what a
> >> property of the class (e.g., JoinRelType, SemiJoinType).
> >>
> >> Moreover, Correlate as it is right now, is basically a nested loop join
> (as
> >> its Javadoc also indicates).
> >> Nested loop join is most often encountered as a physical operator so I
> am
> >> not sure if it should remain as is (in particular the LogicalCorrelate).
> >> As we do not have HashJoin, MergeJoin, etc., operators at the logical
> >> level, I think we should not have a NestedLoopJoin (aka.,
> LogicalCorrelate).
> >> There are valid reasons why Correlate was introduced in the first place
> but
> >> I think we should rethink a bit the design and the needs.
> >>
> >> @Julian: I do not know to what extend you would like to rethink the
> >> hierarchy but I have the impression that even small changes can easily
> >> break backward compatibility.
> >>
> >>
> >> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]
> >
> >> έγραψε:
> >>
> >>> I just discovered that Correlate, which is neither a Join nor a
> SemiJoin,
> >>> uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> >>>
> >>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
> >>>
> >>> Julian
> >>>
> >>>
> >>>
> >>
>
>

Reply via email to