> Is your concern with how we have structured the class hierarchy? Or just how 
> we describe Correlate in the documentation?

My concern is with both, but mainly the former.

> I do agree that Correlate and nested loops joins are not the same (one is 
> logical, the other physical). However, they have a lot in common, in 
> particular the fact that one input sets variables and the input reads those 
> variables.

I think this commonality describes how the query is written, but not
necessarily what it is logically equivalent to. It also describes the
"how", and not necessarily the "what". I would say logical
representations should be concerned with the "what" part.

> I can’t think of any way to represent a nested loops join (e.g. for each 
> department, find all employees in that department) that does not use 
> variables to tie together the two inputs. And therefore I am happy with the 
> fact that our Java implementation of nested-loops join is ‘class 
> EnumerableCorrelate extends Correlate’.

That is correct. The two variables are required. At the logical level
they are mapped to the Correlate variables, or the Join keys after
decorrelation. After going to physical, we can only have join keys.
One of the keys can be the basis for the outer loop and the other for
the inner loop if needed. That is true for both Correlate and Join
operators. Both keys can even be used in another way than forming
nested loops such as using them to implement hash or merge joins
(again for regular Join or Correlate join after decorrelation).

Thanks,
Walaa.

On Thu, Mar 21, 2019 at 2:08 PM Julian Hyde <[email protected]> wrote:
>
> > In addition, I would not present Correlate
> > as a nested loops join.
>
>
> Is your concern with how we have structured the class hierarchy? Or just how 
> we describe Correlate in the documentation?
>
> I do agree that Correlate and nested loops joins are not the same (one is 
> logical, the other physical). However, they have a lot in common, in 
> particular the fact that one input sets variables and the input reads those 
> variables.
>
> I can’t think of any way to represent a nested loops join (e.g. for each 
> department, find all employees in that department) that does not use 
> variables to tie together the two inputs. And therefore I am happy with the 
> fact that our Java implementation of nested-loops join is ‘class 
> EnumerableCorrelate extends Correlate’.
>
>
> Julian
>
> > On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <[email protected]> 
> > wrote:
> >
> > I would vote for number 3. In addition, I would not present Correlate
> > as a nested loops join. Moreover, nested loops, hash and merge joins
> > should be able to map to both Join or Correlate logical ones when
> > possible (no inherent correlation between logical join type and
> > physical types).
> >
> > On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]> wrote:
> >>
> >> I have a few ideas for refactorings. (I’m not convinced by any of them, 
> >> but let me know which you like.)
> >>
> >> 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, 
> >> it is used by Correlate, but in a field called joinType).
> >>
> >> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has 
> >> the same set of values as SemiJoinType, but it has a better name.
> >>
> >> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType 
> >> for everything. We would have to add SEMI and ANTI values. Also some 
> >> methods to find out whether the resulting row type contains fields from 
> >> the left and right inputs or just the left input.
> >>
> >> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and 
> >> Correlate implement it. It would have a methods that say whether the LHS 
> >> and RHS generate nulls, and whether the output row type contains columns 
> >> from the right input. This seems attractive because it lets Join, SemiJoin 
> >> and Correlate continue to be structurally different.
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]> wrote:
> >>>
> >>> SubPlan (in Postgres’ term) is a Postgres physical relational node to 
> >>> evaluate correlated subquery. What I mean is correlated subquery that 
> >>> can’t be decorrelated can’t be implemented by hashjoin or mergejoin. But 
> >>> it is off topic.
> >>>
> >>> Thanks ~
> >>> Haisheng Yuan
> >>> ------------------------------------------------------------------
> >>> 发件人:Walaa Eldin Moustafa<[email protected]>
> >>> 日 期:2019年03月21日 09:31:41
> >>> 收件人:<[email protected]>
> >>> 抄 送:Stamatis Zampetakis<[email protected]>
> >>> 主 题:Re: Re: Join, SemiJoin, Correlate
> >>>
> >>> Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
> >>> NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
> >>> join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
> >>> rewritten as semi/anti joins, and 2) nested loops join is more of a
> >>> physical operator.
> >>>
> >>> It seems that the minimal set of logical join types are INNER, LEFT,
> >>> RIGHT, OUTER, SEMI, ANTI.
> >>>
> >>> So I think Calciate could have one LogicalJoin operator with an
> >>> attribute to specify the join type (from the above), and a number of
> >>> physical join operators (hash, merge, nested loops) whose
> >>> implementation details depend on the the join type.
> >>>
> >>> What we lose by this model is the structure of the query (whether
> >>> there was a sub-plan or not), but I would say that this is actually
> >>> what is desired from a logical representation -- to abstract away from
> >>> how the query is written, and how it is structured, as long as there
> >>> is a canonical representation. There could also be a world where both
> >>> models coexist (Correlate first then Decorrelate but in the light of a
> >>> single logical join operator?).
> >>>
> >>> @Haisheng, generally, a sub-plan can also be implemented using a
> >>> variant of hash or merge joins as long as we evaluate the sub-plan
> >>> independently (without the join predicate), but that is up to the
> >>> optimizer.
> >>>
> >>> Thanks,
> >>> Walaa.
> >>>
> >>> On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <[email protected]> 
> >>> wrote:
> >>>>
> >>>> SemiJoinType and its relationship with JoinRelType do confuse me a 
> >>>> little bit.
> >>>>
> >>>> But I don’t think we should not have LogicalCorrelate. It is useful to 
> >>>> represent the lateral or correlated subquery (aka SubPlan in Postgres 
> >>>> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in 
> >>>> Calcite, or SubPlan in Postgres’s terminology, but it can’t be 
> >>>> implemented as HashJoin or MergeJoin.
> >>>>
> >>>> Thanks ~
> >>>> Haisheng Yuan
> >>>> ------------------------------------------------------------------
> >>>> 发件人:Stamatis Zampetakis<[email protected]>
> >>>> 日 期:2019年03月21日 07:13:15
> >>>> 收件人:<[email protected]>
> >>>> 主 题:Re: Join, SemiJoin, Correlate
> >>>>
> >>>> I have bumped into this quite a few times and I think we should really 
> >>>> try
> >>>> to improve the design of the join hierarchy.
> >>>>
> >>>> From a logical point of view I think it makes sense to have the following
> >>>> operators:
> >>>> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin)
> >>>>
> >>>> Yet I have not thought thoroughly what should become a class, and what a
> >>>> property of the class (e.g., JoinRelType, SemiJoinType).
> >>>>
> >>>> Moreover, Correlate as it is right now, is basically a nested loop join 
> >>>> (as
> >>>> its Javadoc also indicates).
> >>>> Nested loop join is most often encountered as a physical operator so I am
> >>>> not sure if it should remain as is (in particular the LogicalCorrelate).
> >>>> As we do not have HashJoin, MergeJoin, etc., operators at the logical
> >>>> level, I think we should not have a NestedLoopJoin (aka., 
> >>>> LogicalCorrelate).
> >>>> There are valid reasons why Correlate was introduced in the first place 
> >>>> but
> >>>> I think we should rethink a bit the design and the needs.
> >>>>
> >>>> @Julian: I do not know to what extend you would like to rethink the
> >>>> hierarchy but I have the impression that even small changes can easily
> >>>> break backward compatibility.
> >>>>
> >>>>
> >>>> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]>
> >>>> έγραψε:
> >>>>
> >>>>> I just discovered that Correlate, which is neither a Join nor a 
> >>>>> SemiJoin,
> >>>>> uses SemiJoinType, but SemiJoin does not use SemiJoinType.
> >>>>>
> >>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
>

Reply via email to