I have a few ideas for refactorings. (I’m not convinced by any of them, but let me know which you like.)
1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, it is used by Correlate, but in a field called joinType). 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has the same set of values as SemiJoinType, but it has a better name. 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType for everything. We would have to add SEMI and ANTI values. Also some methods to find out whether the resulting row type contains fields from the left and right inputs or just the left input. 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and Correlate implement it. It would have a methods that say whether the LHS and RHS generate nulls, and whether the output row type contains columns from the right input. This seems attractive because it lets Join, SemiJoin and Correlate continue to be structurally different. Julian > On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]> wrote: > > SubPlan (in Postgres’ term) is a Postgres physical relational node to > evaluate correlated subquery. What I mean is correlated subquery that can’t > be decorrelated can’t be implemented by hashjoin or mergejoin. But it is off > topic. > > Thanks ~ > Haisheng Yuan > ------------------------------------------------------------------ > 发件人:Walaa Eldin Moustafa<[email protected]> > 日 期:2019年03月21日 09:31:41 > 收件人:<[email protected]> > 抄 送:Stamatis Zampetakis<[email protected]> > 主 题:Re: Re: Join, SemiJoin, Correlate > > Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS, > NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops > join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be > rewritten as semi/anti joins, and 2) nested loops join is more of a > physical operator. > > It seems that the minimal set of logical join types are INNER, LEFT, > RIGHT, OUTER, SEMI, ANTI. > > So I think Calciate could have one LogicalJoin operator with an > attribute to specify the join type (from the above), and a number of > physical join operators (hash, merge, nested loops) whose > implementation details depend on the the join type. > > What we lose by this model is the structure of the query (whether > there was a sub-plan or not), but I would say that this is actually > what is desired from a logical representation -- to abstract away from > how the query is written, and how it is structured, as long as there > is a canonical representation. There could also be a world where both > models coexist (Correlate first then Decorrelate but in the light of a > single logical join operator?). > > @Haisheng, generally, a sub-plan can also be implemented using a > variant of hash or merge joins as long as we evaluate the sub-plan > independently (without the join predicate), but that is up to the > optimizer. > > Thanks, > Walaa. > > On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <[email protected]> wrote: >> >> SemiJoinType and its relationship with JoinRelType do confuse me a little >> bit. >> >> But I don’t think we should not have LogicalCorrelate. It is useful to >> represent the lateral or correlated subquery (aka SubPlan in Postgres >> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in Calcite, >> or SubPlan in Postgres’s terminology, but it can’t be implemented as >> HashJoin or MergeJoin. >> >> Thanks ~ >> Haisheng Yuan >> ------------------------------------------------------------------ >> 发件人:Stamatis Zampetakis<[email protected]> >> 日 期:2019年03月21日 07:13:15 >> 收件人:<[email protected]> >> 主 题:Re: Join, SemiJoin, Correlate >> >> I have bumped into this quite a few times and I think we should really try >> to improve the design of the join hierarchy. >> >> From a logical point of view I think it makes sense to have the following >> operators: >> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin) >> >> Yet I have not thought thoroughly what should become a class, and what a >> property of the class (e.g., JoinRelType, SemiJoinType). >> >> Moreover, Correlate as it is right now, is basically a nested loop join (as >> its Javadoc also indicates). >> Nested loop join is most often encountered as a physical operator so I am >> not sure if it should remain as is (in particular the LogicalCorrelate). >> As we do not have HashJoin, MergeJoin, etc., operators at the logical >> level, I think we should not have a NestedLoopJoin (aka., LogicalCorrelate). >> There are valid reasons why Correlate was introduced in the first place but >> I think we should rethink a bit the design and the needs. >> >> @Julian: I do not know to what extend you would like to rethink the >> hierarchy but I have the impression that even small changes can easily >> break backward compatibility. >> >> >> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]> >> έγραψε: >> >>> I just discovered that Correlate, which is neither a Join nor a SemiJoin, >>> uses SemiJoinType, but SemiJoin does not use SemiJoinType. >>> >>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. >>> >>> Julian >>> >>> >>> >>
