I agree with Stamatis that this would be a useful feature. The first step would 
be log a jira case with a clear description of the problem. Can you describe 
exactly what you mean by ‘mergeable’, ‘redundant' and ’squash’?

I can’t follow your example, Ian. Can you find a simpler example, or possibly 
express in SQL?

Decorrelation (and introducing a ‘value generator’ for all, or a superset of, 
the values of the correlating variable) is hard for general queries. I think it 
would be wrong to try too hard to produce a minimal query when decorrelating. 
Better to produce something that works, using a simple algorithm, and simplify 
it later.

Julian



> On Mar 30, 2023, at 10:29 AM, Ian Bertolacci 
> <[email protected]> wrote:
> 
> Stamatis,
> Thanks! I’ll take a look at this.
> Is it possible to avoid the creation of these kinds of joins correlated 
> subqueries? Or does that still have the same issue as general joins?
> 
> Thanks!
> -Ian
> 
> From: Stamatis Zampetakis <[email protected]>
> Date: Thursday, March 30, 2023 at 1:41 AM
> To: [email protected] <[email protected]>
> Subject: [External Sender] Re: Rules to squash redundant join sides?
> Hey Ian,
> 
> I think you are referring to the problem of query minimization and at
> the moment we don't have any such rules in Calcite but it would be a
> valuable contribution.
> Apart from the algorithm itself, it might be necessary to introduce
> some new metadata provider for PK-FK relationships otherwise dropping
> a join may remove duplicates and change the semantics of the plan.
> 
> You can find some previous discussion here [1].
> 
> Best,
> Stamatis
> 
> [1] 
> https://urldefense.com/v3/__https://lists.apache.org/thread/vsq59yhfj4glf7mgpf9n6j255myhs0so__;!!Iz9xO38YGHZK!7_nW_9SgSRGAzlQa7TmPcod2_Xhqk41iPbC7Wzi-IoMCPhnEG1ZYGnknVSxbgOx-CLhOl2WSB_2MMTXI1BMG$<https://urldefense.com/v3/__https:/lists.apache.org/thread/vsq59yhfj4glf7mgpf9n6j255myhs0so__;!!Iz9xO38YGHZK!7_nW_9SgSRGAzlQa7TmPcod2_Xhqk41iPbC7Wzi-IoMCPhnEG1ZYGnknVSxbgOx-CLhOl2WSB_2MMTXI1BMG$>
> 
> On Thu, Mar 30, 2023 at 1:27 AM Ian Bertolacci
> <[email protected]> wrote:
>> 
>> Howdy,
>> Is there a collection of rules which squash a tree of binary joins if the 
>> same side of each join is mergeable?
>> 
>> For example:
>> ```
>> 201:LogicalProject(P4=[$70], P5=[$72])
>> └─ 199:LogicalJoin(condition=[=($0, $71)], joinType=[left])
>>   ├─ 190:LogicalProject(...)
>>   |  └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left])
>>   |     ├─ 164:QueryTableScan(table=[[Query, T123]])
>>   |     └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
>>   |        └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12])
>>   |           └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)])
>>   |              └─ 165:QueryTableScan(table=[[QUERY, T893]])
>>   └─ 197:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
>>      └─ 195:LogicalProject(C5633_586=[$85], C5633_203=[$45])
>>         └─ 193:LogicalFilter(condition=[IS_NOT_NULL($85)])
>>            └─ 172:QueryTableScan(table=[[QUERY, T893]])
>> ```
>> 
>> Can be simplified to one join as something like:
>> ```
>> 201:LogicalProject(P4=[$lhs+1)], P5=[$lhs+2])
>> └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left])
>>   ├─ 164:QueryTableScan(table=[[Query, T123]])
>>   └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1), 
>> EXPR$0=[ARRAY_AGG($2)])
>>      └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12], C5633_203=[$45])
>>         └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)])
>>            └─ 165:QueryTableScan(table=[[QUERY, T893]])
>> ```
>> 
>> 
>> I spent some time trying applying various CoreRules, but didn’t immediately 
>> see anything I wanted.
>> I figure that there exists some set of existing rules which when applied 
>> together would accomplish what we want here.
>> 
>> The joins arise from correlated subqueries, but are also synthetically 
>> generated, so it is not as simple as asking the user to manually do a single 
>> join.
>> 
>> Thanks!
>> -Ian Bertolacci

Reply via email to