Since these are inner joins, you could convert Join(i0, i1, c0 and c1 and c2)
into Filter(c2) Filter(c1) Filter(c0) CrossJoin(i0, i1) without changing semantics. I believe that’s what LoptOptimizeMultiJoinRule does - it shreds the joins into a join graph and reassembles them using a greedy algorithm. > On Mar 14, 2025, at 8:29 AM, Steven Phillips <ste...@dremio.com.INVALID> > wrote: > > You might try looking into the MultiJoinOptimizeBushy rule. I think it also > is not doing exactly what you want, but it might be a good starting point > for a rule that does. > > > On Fri, Mar 14, 2025 at 1:35 AM Mads Sejer Pedersen <s...@itu.dk.invalid> > wrote: > >> Hi people, >> >> I am doing some benchmarking with Calcite for the sql-api in Apache Wayang >> that requires typically multiconditional joins to be split into "binary" >> joins ala: >> LogicalJoin(condition=[AND(=($0, $27), =($10, $28), =($34, $2))], >> joinType=[inner]): rowcount = 118.65234375, cumulative cost = 1038.96484375 >> LogicalJoin(condition=[=($0, $11)], joinType=[inner]): >> rowcount = 351.5625, cumulative cost = 820.3125 >> LogicalJoin(condition=[=($0, $3)], joinType=[inner]): >> rowcount = 93.75, cumulative cost = 343.75 >> LogicalFilter(condition=[SEARCH($1, >> Sarg['cs':CHAR(11), 'gaming':CHAR(11), 'mathematica']:CHAR(11))]): rowcount >> = 25.0, cumulative cost = 125.0 >> LogicalTableScan(table=[[postgres, site]]): rowcount >> = 100.0, cumulative cost = 100.0 >> LogicalFilter(condition=[SEARCH($6, >> Sarg[[10..100000]])]): rowcount = 25.0, cumulative cost = 125.0 >> LogicalTableScan(table=[[postgres, so_user]]): >> rowcount = 100.0, cumulative cost = 100.0 >> LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]): >> rowcount = 25.0, cumulative cost = 125.0 >> LogicalTableScan(table=[[postgres, question]]): >> rowcount = 100.0, cumulative cost = 100.0 >> LogicalTableScan(table=[[postgres, answer]]): rowcount = >> 100.0, cumulative cost = 100.0 >> >> >> BinaryJoin(condition=[=($60, $2)], joinType=[inner]) >> BinaryJoin(condition=[=($10, $41)], joinType=[inner]) >> BinaryJoin(condition=[=($0, $27)], joinType=[inner]) >> LogicalJoin(condition=[=($0, $11)], joinType=[inner]) >> LogicalJoin(condition=[=($0, $3)], joinType=[inner]) >> LogicalFilter(condition=[SEARCH($1, Sarg['cs':CHAR(11), >> 'gaming':CHAR(11), 'mathematica']:CHAR(11))]) >> LogicalTableScan(table=[[postgres, site]]) >> LogicalFilter(condition=[SEARCH($6, Sarg[[10..100000]])]) >> LogicalTableScan(table=[[postgres, so_user]]) >> LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]) >> LogicalTableScan(table=[[postgres, question]]) >> LogicalTableScan(table=[[postgres, answer]]) >> LogicalTableScan(table=[[postgres, answer]]) >> LogicalTableScan(table=[[postgres, answer]]) >> >> Is this something that is already supported in Calcite? I have looked at >> current Calcite rules; JoinToMultiJoinRule, LoptOptimizeJoinRule, but they >> don't quite fit my use case. >> Furthermore, if it is not supported, how would one go about implementing >> such a split? I have looked at a rules-based implementation using the >> hep-planner. But I am having issues with how to translate the RexInputRef's >> indexes to the "right" place, as I need the indexes to always point to the >> new joining table rows. >> >>