> In addition, I would not present Correlate
> as a nested loops join.


Is your concern with how we have structured the class hierarchy? Or just how we 
describe Correlate in the documentation?

I do agree that Correlate and nested loops joins are not the same (one is 
logical, the other physical). However, they have a lot in common, in particular 
the fact that one input sets variables and the input reads those variables.

I can’t think of any way to represent a nested loops join (e.g. for each 
department, find all employees in that department) that does not use variables 
to tie together the two inputs. And therefore I am happy with the fact that our 
Java implementation of nested-loops join is ‘class EnumerableCorrelate extends 
Correlate’.


Julian

> On Mar 21, 2019, at 1:12 PM, Walaa Eldin Moustafa <[email protected]> 
> wrote:
> 
> I would vote for number 3. In addition, I would not present Correlate
> as a nested loops join. Moreover, nested loops, hash and merge joins
> should be able to map to both Join or Correlate logical ones when
> possible (no inherent correlation between logical join type and
> physical types).
> 
> On Thu, Mar 21, 2019 at 11:55 AM Julian Hyde <[email protected]> wrote:
>> 
>> I have a few ideas for refactorings. (I’m not convinced by any of them, but 
>> let me know which you like.)
>> 
>> 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, it 
>> is used by Correlate, but in a field called joinType).
>> 
>> 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has the 
>> same set of values as SemiJoinType, but it has a better name.
>> 
>> 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType 
>> for everything. We would have to add SEMI and ANTI values. Also some methods 
>> to find out whether the resulting row type contains fields from the left and 
>> right inputs or just the left input.
>> 
>> 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and 
>> Correlate implement it. It would have a methods that say whether the LHS and 
>> RHS generate nulls, and whether the output row type contains columns from 
>> the right input. This seems attractive because it lets Join, SemiJoin and 
>> Correlate continue to be structurally different.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <[email protected]> wrote:
>>> 
>>> SubPlan (in Postgres’ term) is a Postgres physical relational node to 
>>> evaluate correlated subquery. What I mean is correlated subquery that can’t 
>>> be decorrelated can’t be implemented by hashjoin or mergejoin. But it is 
>>> off topic.
>>> 
>>> Thanks ~
>>> Haisheng Yuan
>>> ------------------------------------------------------------------
>>> 发件人:Walaa Eldin Moustafa<[email protected]>
>>> 日 期:2019年03月21日 09:31:41
>>> 收件人:<[email protected]>
>>> 抄 送:Stamatis Zampetakis<[email protected]>
>>> 主 题:Re: Re: Join, SemiJoin, Correlate
>>> 
>>> Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS,
>>> NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops
>>> join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be
>>> rewritten as semi/anti joins, and 2) nested loops join is more of a
>>> physical operator.
>>> 
>>> It seems that the minimal set of logical join types are INNER, LEFT,
>>> RIGHT, OUTER, SEMI, ANTI.
>>> 
>>> So I think Calciate could have one LogicalJoin operator with an
>>> attribute to specify the join type (from the above), and a number of
>>> physical join operators (hash, merge, nested loops) whose
>>> implementation details depend on the the join type.
>>> 
>>> What we lose by this model is the structure of the query (whether
>>> there was a sub-plan or not), but I would say that this is actually
>>> what is desired from a logical representation -- to abstract away from
>>> how the query is written, and how it is structured, as long as there
>>> is a canonical representation. There could also be a world where both
>>> models coexist (Correlate first then Decorrelate but in the light of a
>>> single logical join operator?).
>>> 
>>> @Haisheng, generally, a sub-plan can also be implemented using a
>>> variant of hash or merge joins as long as we evaluate the sub-plan
>>> independently (without the join predicate), but that is up to the
>>> optimizer.
>>> 
>>> Thanks,
>>> Walaa.
>>> 
>>> On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <[email protected]> 
>>> wrote:
>>>> 
>>>> SemiJoinType and its relationship with JoinRelType do confuse me a little 
>>>> bit.
>>>> 
>>>> But I don’t think we should not have LogicalCorrelate. It is useful to 
>>>> represent the lateral or correlated subquery (aka SubPlan in Postgres 
>>>> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in 
>>>> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented 
>>>> as HashJoin or MergeJoin.
>>>> 
>>>> Thanks ~
>>>> Haisheng Yuan
>>>> ------------------------------------------------------------------
>>>> 发件人:Stamatis Zampetakis<[email protected]>
>>>> 日 期:2019年03月21日 07:13:15
>>>> 收件人:<[email protected]>
>>>> 主 题:Re: Join, SemiJoin, Correlate
>>>> 
>>>> I have bumped into this quite a few times and I think we should really try
>>>> to improve the design of the join hierarchy.
>>>> 
>>>> From a logical point of view I think it makes sense to have the following
>>>> operators:
>>>> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin)
>>>> 
>>>> Yet I have not thought thoroughly what should become a class, and what a
>>>> property of the class (e.g., JoinRelType, SemiJoinType).
>>>> 
>>>> Moreover, Correlate as it is right now, is basically a nested loop join (as
>>>> its Javadoc also indicates).
>>>> Nested loop join is most often encountered as a physical operator so I am
>>>> not sure if it should remain as is (in particular the LogicalCorrelate).
>>>> As we do not have HashJoin, MergeJoin, etc., operators at the logical
>>>> level, I think we should not have a NestedLoopJoin (aka., 
>>>> LogicalCorrelate).
>>>> There are valid reasons why Correlate was introduced in the first place but
>>>> I think we should rethink a bit the design and the needs.
>>>> 
>>>> @Julian: I do not know to what extend you would like to rethink the
>>>> hierarchy but I have the impression that even small changes can easily
>>>> break backward compatibility.
>>>> 
>>>> 
>>>> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]>
>>>> έγραψε:
>>>> 
>>>>> I just discovered that Correlate, which is neither a Join nor a SemiJoin,
>>>>> uses SemiJoinType, but SemiJoin does not use SemiJoinType.
>>>>> 
>>>>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 

Reply via email to