> In addition, I would not present Correlate > as a nested loops join.
Is your concern with how we have structured the class hierarchy? Or just how we describe Correlate in the documentation? I do agree that Correlate and nested loops joins are not the same (one is logical, the other physical). However, they have a lot in common, in particular the fact that one input sets variables and the input reads those variables. I can’t think of any way to represent a nested loops join (e.g. for each department, find all employees in that department) that does not use variables to tie together the two inputs. And therefore I am happy with the fact that our Java implementation of nested-loops join is ‘class EnumerableCorrelate extends Correlate’. Julian > On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <[email protected]> > wrote: > > I would vote for number 3. In addition, I would not present Correlate > as a nested loops join. Moreover, nested loops, hash and merge joins > should be able to map to both Join or Correlate logical ones when > possible (no inherent correlation between logical join type and > physical types). > > On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]> wrote: >> >> I have a few ideas for refactorings. (I’m not convinced by any of them, but >> let me know which you like.) >> >> 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, it >> is used by Correlate, but in a field called joinType). >> >> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has the >> same set of values as SemiJoinType, but it has a better name. >> >> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType >> for everything. We would have to add SEMI and ANTI values. Also some methods >> to find out whether the resulting row type contains fields from the left and >> right inputs or just the left input. >> >> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and >> Correlate implement it. It would have a methods that say whether the LHS and >> RHS generate nulls, and whether the output row type contains columns from >> the right input. This seems attractive because it lets Join, SemiJoin and >> Correlate continue to be structurally different. >> >> Julian >> >> >> >> >>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]> wrote: >>> >>> SubPlan (in Postgres’ term) is a Postgres physical relational node to >>> evaluate correlated subquery. What I mean is correlated subquery that can’t >>> be decorrelated can’t be implemented by hashjoin or mergejoin. But it is >>> off topic. >>> >>> Thanks ~ >>> Haisheng Yuan >>> ------------------------------------------------------------------ >>> 发件人:Walaa Eldin Moustafa<[email protected]> >>> 日 期:2019年03月21日 09:31:41 >>> 收件人:<[email protected]> >>> 抄 送:Stamatis Zampetakis<[email protected]> >>> 主 题:Re: Re: Join, SemiJoin, Correlate >>> >>> Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS, >>> NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops >>> join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be >>> rewritten as semi/anti joins, and 2) nested loops join is more of a >>> physical operator. >>> >>> It seems that the minimal set of logical join types are INNER, LEFT, >>> RIGHT, OUTER, SEMI, ANTI. >>> >>> So I think Calciate could have one LogicalJoin operator with an >>> attribute to specify the join type (from the above), and a number of >>> physical join operators (hash, merge, nested loops) whose >>> implementation details depend on the the join type. >>> >>> What we lose by this model is the structure of the query (whether >>> there was a sub-plan or not), but I would say that this is actually >>> what is desired from a logical representation -- to abstract away from >>> how the query is written, and how it is structured, as long as there >>> is a canonical representation. There could also be a world where both >>> models coexist (Correlate first then Decorrelate but in the light of a >>> single logical join operator?). >>> >>> @Haisheng, generally, a sub-plan can also be implemented using a >>> variant of hash or merge joins as long as we evaluate the sub-plan >>> independently (without the join predicate), but that is up to the >>> optimizer. >>> >>> Thanks, >>> Walaa. >>> >>> On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <[email protected]> >>> wrote: >>>> >>>> SemiJoinType and its relationship with JoinRelType do confuse me a little >>>> bit. >>>> >>>> But I don’t think we should not have LogicalCorrelate. It is useful to >>>> represent the lateral or correlated subquery (aka SubPlan in Postgres >>>> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in >>>> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented >>>> as HashJoin or MergeJoin. >>>> >>>> Thanks ~ >>>> Haisheng Yuan >>>> ------------------------------------------------------------------ >>>> 发件人:Stamatis Zampetakis<[email protected]> >>>> 日 期:2019年03月21日 07:13:15 >>>> 收件人:<[email protected]> >>>> 主 题:Re: Join, SemiJoin, Correlate >>>> >>>> I have bumped into this quite a few times and I think we should really try >>>> to improve the design of the join hierarchy. >>>> >>>> From a logical point of view I think it makes sense to have the following >>>> operators: >>>> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin) >>>> >>>> Yet I have not thought thoroughly what should become a class, and what a >>>> property of the class (e.g., JoinRelType, SemiJoinType). >>>> >>>> Moreover, Correlate as it is right now, is basically a nested loop join (as >>>> its Javadoc also indicates). >>>> Nested loop join is most often encountered as a physical operator so I am >>>> not sure if it should remain as is (in particular the LogicalCorrelate). >>>> As we do not have HashJoin, MergeJoin, etc., operators at the logical >>>> level, I think we should not have a NestedLoopJoin (aka., >>>> LogicalCorrelate). >>>> There are valid reasons why Correlate was introduced in the first place but >>>> I think we should rethink a bit the design and the needs. >>>> >>>> @Julian: I do not know to what extend you would like to rethink the >>>> hierarchy but I have the impression that even small changes can easily >>>> break backward compatibility. >>>> >>>> >>>> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]> >>>> έγραψε: >>>> >>>>> I just discovered that Correlate, which is neither a Join nor a SemiJoin, >>>>> uses SemiJoinType, but SemiJoin does not use SemiJoinType. >>>>> >>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. >>>>> >>>>> Julian >>>>> >>>>> >>>>> >>>> >>
