I agree with Stamatis that this would be a useful feature. The first step would be log a jira case with a clear description of the problem. Can you describe exactly what you mean by ‘mergeable’, ‘redundant' and ’squash’?
I can’t follow your example, Ian. Can you find a simpler example, or possibly express in SQL? Decorrelation (and introducing a ‘value generator’ for all, or a superset of, the values of the correlating variable) is hard for general queries. I think it would be wrong to try too hard to produce a minimal query when decorrelating. Better to produce something that works, using a simple algorithm, and simplify it later. Julian > On Mar 30, 2023, at 10:29 AM, Ian Bertolacci > <[email protected]> wrote: > > Stamatis, > Thanks! I’ll take a look at this. > Is it possible to avoid the creation of these kinds of joins correlated > subqueries? Or does that still have the same issue as general joins? > > Thanks! > -Ian > > From: Stamatis Zampetakis <[email protected]> > Date: Thursday, March 30, 2023 at 1:41 AM > To: [email protected] <[email protected]> > Subject: [External Sender] Re: Rules to squash redundant join sides? > Hey Ian, > > I think you are referring to the problem of query minimization and at > the moment we don't have any such rules in Calcite but it would be a > valuable contribution. > Apart from the algorithm itself, it might be necessary to introduce > some new metadata provider for PK-FK relationships otherwise dropping > a join may remove duplicates and change the semantics of the plan. > > You can find some previous discussion here [1]. > > Best, > Stamatis > > [1] > https://urldefense.com/v3/__https://lists.apache.org/thread/vsq59yhfj4glf7mgpf9n6j255myhs0so__;!!Iz9xO38YGHZK!7_nW_9SgSRGAzlQa7TmPcod2_Xhqk41iPbC7Wzi-IoMCPhnEG1ZYGnknVSxbgOx-CLhOl2WSB_2MMTXI1BMG$<https://urldefense.com/v3/__https:/lists.apache.org/thread/vsq59yhfj4glf7mgpf9n6j255myhs0so__;!!Iz9xO38YGHZK!7_nW_9SgSRGAzlQa7TmPcod2_Xhqk41iPbC7Wzi-IoMCPhnEG1ZYGnknVSxbgOx-CLhOl2WSB_2MMTXI1BMG$> > > On Thu, Mar 30, 2023 at 1:27 AM Ian Bertolacci > <[email protected]> wrote: >> >> Howdy, >> Is there a collection of rules which squash a tree of binary joins if the >> same side of each join is mergeable? >> >> For example: >> ``` >> 201:LogicalProject(P4=[$70], P5=[$72]) >> └─ 199:LogicalJoin(condition=[=($0, $71)], joinType=[left]) >> ├─ 190:LogicalProject(...) >> | └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left]) >> | ├─ 164:QueryTableScan(table=[[Query, T123]]) >> | └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)]) >> | └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12]) >> | └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)]) >> | └─ 165:QueryTableScan(table=[[QUERY, T893]]) >> └─ 197:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)]) >> └─ 195:LogicalProject(C5633_586=[$85], C5633_203=[$45]) >> └─ 193:LogicalFilter(condition=[IS_NOT_NULL($85)]) >> └─ 172:QueryTableScan(table=[[QUERY, T893]]) >> ``` >> >> Can be simplified to one join as something like: >> ``` >> 201:LogicalProject(P4=[$lhs+1)], P5=[$lhs+2]) >> └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left]) >> ├─ 164:QueryTableScan(table=[[Query, T123]]) >> └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1), >> EXPR$0=[ARRAY_AGG($2)]) >> └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12], C5633_203=[$45]) >> └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)]) >> └─ 165:QueryTableScan(table=[[QUERY, T893]]) >> ``` >> >> >> I spent some time trying applying various CoreRules, but didn’t immediately >> see anything I wanted. >> I figure that there exists some set of existing rules which when applied >> together would accomplish what we want here. >> >> The joins arise from correlated subqueries, but are also synthetically >> generated, so it is not as simple as asking the user to manually do a single >> join. >> >> Thanks! >> -Ian Bertolacci
