Since we are discussing this topic I thought it would be could to bring
back
to the surface a similar discussion [1] that has been done a few years ago
in this list.

I am leaning towards option 3 where JoinRelType has all necessary values:
Inner, Left, Semi, Anti, and Full.
With these changes it seems we could remove (deprecate) also SemiJoin, and
EquiJoin.

On the physical level we could have:
1. EnumerableCorrelate or EnumerableNestedLoopJoin;
2. EnumerableMergeJoin;
3. EnumerableHashJoin (currently EnumerableJoin)

and for the above we could pass the JoinRelType throwing an exception when
the specific algorithm cannot be used to implement a specific type of join.

EnumerableSemiJoin and EnumerableThetaJoin could also be removed and
covered from the above I think.

Regarding Correlate and LogicalCorrelate, I am not sure what should we do.
Associating the JoinRelType with it does not seem right, and making
Correlate also a Join is not very attractive either.

Best,
Stamatis

[1]
http://mail-archives.apache.org/mod_mbox/calcite-dev/201411.mbox/%3CCAB%3DJe-H7AWEHbKzjrRHd-YcZgkgWzFORALrz_mMc2k7WDdj54Q%40mail.gmail.com%3E


Στις Πέμ, 21 Μαρ 2019 στις 10:35 μ.μ., ο/η Walaa Eldin Moustafa <
[email protected]> έγραψε:

> > Is your concern with how we have structured the class hierarchy? Or just
> how we describe Correlate in the documentation?
>
> My concern is with both, but mainly the former.
>
> > I do agree that Correlate and nested loops joins are not the same (one
> is logical, the other physical). However, they have a lot in common, in
> particular the fact that one input sets variables and the input reads those
> variables.
>
> I think this commonality describes how the query is written, but not
> necessarily what it is logically equivalent to. It also describes the
> "how", and not necessarily the "what". I would say logical
> representations should be concerned with the "what" part.
>
> > I can’t think of any way to represent a nested loops join (e.g. for each
> department, find all employees in that department) that does not use
> variables to tie together the two inputs. And therefore I am happy with the
> fact that our Java implementation of nested-loops join is ‘class
> EnumerableCorrelate extends Correlate’.
>
> That is correct. The two variables are required. At the logical level
> they are mapped to the Correlate variables, or the Join keys after
> decorrelation. After going to physical, we can only have join keys.
> One of the keys can be the basis for the outer loop and the other for
> the inner loop if needed. That is true for both Correlate and Join
> operators. Both keys can even be used in another way than forming
> nested loops such as using them to implement hash or merge joins
> (again for regular Join or Correlate join after decorrelation).
>
> Thanks,
> Walaa.
>
> On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde <[email protected]> wrote:
> >
> > > In addition, I would not present Correlate
> > > as a nested loops join.
> >
> >
> > Is your concern with how we have structured the class hierarchy? Or just
> how we describe Correlate in the documentation?
> >
> > I do agree that Correlate and nested loops joins are not the same (one
> is logical, the other physical). However, they have a lot in common, in
> particular the fact that one input sets variables and the input reads those
> variables.
> >
> > I can’t think of any way to represent a nested loops join (e.g. for each
> department, find all employees in that department) that does not use
> variables to tie together the two inputs. And therefore I am happy with the
> fact that our Java implementation of nested-loops join is ‘class
> EnumerableCorrelate extends Correlate’.
> >
> >
> > Julian
> >
> > > On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <
> [email protected]> wrote:
> > >
> > > I would vote for number 3. In addition, I would not present Correlate
> > > as a nested loops join. Moreover, nested loops, hash and merge joins
> > > should be able to map to both Join or Correlate logical ones when
> > > possible (no inherent correlation between logical join type and
> > > physical types).
> > >
> > > On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]> wrote:
> > >>
> > >> I have a few ideas for refactorings. (I’m not convinced by any of
> them, but let me know which you like.)
> > >>
> > >> 1. Get rid of SemiJoinType. It is mis-named (it is not used by
> SemiJoin, it is used by Correlate, but in a field called joinType).
> > >>
> > >> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It
> has the same set of values as SemiJoinType, but it has a better name.
> > >>
> > >> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use
> JoinRelType for everything. We would have to add SEMI and ANTI values. Also
> some methods to find out whether the resulting row type contains fields
> from the left and right inputs or just the left input.
> > >>
> > >> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and
> Correlate implement it. It would have a methods that say whether the LHS
> and RHS generate nulls, and whether the output row type contains columns
> from the right input. This seems attractive because it lets Join, SemiJoin
> and Correlate continue to be structurally different.
> > >>
> > >> Julian
> > >>
> > >>
> > >>
> > >>
> > >>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]>
> wrote:
> > >>>
> > >>> SubPlan (in Postgres’ term) is a Postgres physical relational node
> to evaluate correlated subquery. What I mean is correlated subquery that
> can’t be decorrelated can’t be implemented by hashjoin or mergejoin. But it
> is off topic.
> > >>>
> > >>> Thanks ~
> > >>> Haisheng Yuan
> > >>> ------------------------------------------------------------------
> > >>> 发件人:Walaa Eldin Moustafa<[email protected]>
> > >>> 日 期:2019年03月21日 09:31:41
> > >>> 收件人:<[email protected]>
> > >>> 抄 送:Stamatis Zampetakis<[email protected]>
> > >>> 主 题:Re: Re: Join, SemiJoin, Correlate
> > >>>
> > >>> Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
> > >>> NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
> > >>> join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
> > >>> rewritten as semi/anti joins, and 2) nested loops join is more of a
> > >>> physical operator.
> > >>>
> > >>> It seems that the minimal set of logical join types are INNER, LEFT,
> > >>> RIGHT, OUTER, SEMI, ANTI.
> > >>>
> > >>> So I think Calciate could have one LogicalJoin operator with an
> > >>> attribute to specify the join type (from the above), and a number of
> > >>> physical join operators (hash, merge, nested loops) whose
> > >>> implementation details depend on the the join type.
> > >>>
> > >>> What we lose by this model is the structure of the query (whether
> > >>> there was a sub-plan or not), but I would say that this is actually
> > >>> what is desired from a logical representation -- to abstract away
> from
> > >>> how the query is written, and how it is structured, as long as there
> > >>> is a canonical representation. There could also be a world where both
> > >>> models coexist (Correlate first then Decorrelate but in the light of
> a
> > >>> single logical join operator?).
> > >>>
> > >>> @Haisheng, generally, a sub-plan can also be implemented using a
> > >>> variant of hash or merge joins as long as we evaluate the sub-plan
> > >>> independently (without the join predicate), but that is up to the
> > >>> optimizer.
> > >>>
> > >>> Thanks,
> > >>> Walaa.
> > >>>
> > >>> On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <
> [email protected]> wrote:
> > >>>>
> > >>>> SemiJoinType and its relationship with JoinRelType do confuse me a
> little bit.
> > >>>>
> > >>>> But I don’t think we should not have LogicalCorrelate. It is useful
> to represent the lateral or correlated subquery (aka SubPlan in Postgres
> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in
> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented
> as HashJoin or MergeJoin.
> > >>>>
> > >>>> Thanks ~
> > >>>> Haisheng Yuan
> > >>>> ------------------------------------------------------------------
> > >>>> 发件人:Stamatis Zampetakis<[email protected]>
> > >>>> 日 期:2019年03月21日 07:13:15
> > >>>> 收件人:<[email protected]>
> > >>>> 主 题:Re: Join, SemiJoin, Correlate
> > >>>>
> > >>>> I have bumped into this quite a few times and I think we should
> really try
> > >>>> to improve the design of the join hierarchy.
> > >>>>
> > >>>> From a logical point of view I think it makes sense to have the
> following
> > >>>> operators:
> > >>>> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin,
> (GroupJoin)
> > >>>>
> > >>>> Yet I have not thought thoroughly what should become a class, and
> what a
> > >>>> property of the class (e.g., JoinRelType, SemiJoinType).
> > >>>>
> > >>>> Moreover, Correlate as it is right now, is basically a nested loop
> join (as
> > >>>> its Javadoc also indicates).
> > >>>> Nested loop join is most often encountered as a physical operator
> so I am
> > >>>> not sure if it should remain as is (in particular the
> LogicalCorrelate).
> > >>>> As we do not have HashJoin, MergeJoin, etc., operators at the
> logical
> > >>>> level, I think we should not have a NestedLoopJoin (aka.,
> LogicalCorrelate).
> > >>>> There are valid reasons why Correlate was introduced in the first
> place but
> > >>>> I think we should rethink a bit the design and the needs.
> > >>>>
> > >>>> @Julian: I do not know to what extend you would like to rethink the
> > >>>> hierarchy but I have the impression that even small changes can
> easily
> > >>>> break backward compatibility.
> > >>>>
> > >>>>
> > >>>> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <
> [email protected]>
> > >>>> έγραψε:
> > >>>>
> > >>>>> I just discovered that Correlate, which is neither a Join nor a
> SemiJoin,
> > >>>>> uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> > >>>>>
> > >>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some
> thought.
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
>

Reply via email to