Since these are inner joins, you could convert

  Join(i0, i1, c0 and c1 and c2)

into

  Filter(c2)
    Filter(c1)
      Filter(c0)
        CrossJoin(i0, i1)

without changing semantics.

I believe that’s what LoptOptimizeMultiJoinRule does - it shreds the joins into 
a join graph and reassembles them using a greedy algorithm.


> On Mar 14, 2025, at 8:29 AM, Steven Phillips <ste...@dremio.com.INVALID> 
> wrote:
> 
> You might try looking into the MultiJoinOptimizeBushy rule. I think it also
> is not doing exactly what you want, but it might be a good starting point
> for a rule that does.
> 
> 
> On Fri, Mar 14, 2025 at 1:35 AM Mads Sejer Pedersen <s...@itu.dk.invalid>
> wrote:
> 
>> Hi people,
>> 
>> I am doing some benchmarking with Calcite for the sql-api in Apache Wayang
>> that requires typically multiconditional joins to be split into "binary"
>> joins ala:
>> LogicalJoin(condition=[AND(=($0, $27), =($10, $28), =($34, $2))],
>> joinType=[inner]): rowcount = 118.65234375, cumulative cost = 1038.96484375
>>                LogicalJoin(condition=[=($0, $11)], joinType=[inner]):
>> rowcount = 351.5625, cumulative cost = 820.3125
>>                  LogicalJoin(condition=[=($0, $3)], joinType=[inner]):
>> rowcount = 93.75, cumulative cost = 343.75
>>                    LogicalFilter(condition=[SEARCH($1,
>> Sarg['cs':CHAR(11), 'gaming':CHAR(11), 'mathematica']:CHAR(11))]): rowcount
>> = 25.0, cumulative cost = 125.0
>>                      LogicalTableScan(table=[[postgres, site]]): rowcount
>> = 100.0, cumulative cost = 100.0
>>                    LogicalFilter(condition=[SEARCH($6,
>> Sarg[[10..100000]])]): rowcount = 25.0, cumulative cost = 125.0
>>                      LogicalTableScan(table=[[postgres, so_user]]):
>> rowcount = 100.0, cumulative cost = 100.0
>>                  LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]):
>> rowcount = 25.0, cumulative cost = 125.0
>>                    LogicalTableScan(table=[[postgres, question]]):
>> rowcount = 100.0, cumulative cost = 100.0
>>                LogicalTableScan(table=[[postgres, answer]]): rowcount =
>> 100.0, cumulative cost = 100.0
>> 
>> 
>> BinaryJoin(condition=[=($60, $2)], joinType=[inner])
>>  BinaryJoin(condition=[=($10, $41)], joinType=[inner])
>>    BinaryJoin(condition=[=($0, $27)], joinType=[inner])
>>      LogicalJoin(condition=[=($0, $11)], joinType=[inner])
>>        LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>>          LogicalFilter(condition=[SEARCH($1, Sarg['cs':CHAR(11),
>> 'gaming':CHAR(11), 'mathematica']:CHAR(11))])
>>            LogicalTableScan(table=[[postgres, site]])
>>          LogicalFilter(condition=[SEARCH($6, Sarg[[10..100000]])])
>>            LogicalTableScan(table=[[postgres, so_user]])
>>        LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])])
>>          LogicalTableScan(table=[[postgres, question]])
>>      LogicalTableScan(table=[[postgres, answer]])
>>    LogicalTableScan(table=[[postgres, answer]])
>>  LogicalTableScan(table=[[postgres, answer]])
>> 
>> Is this something that is already supported in Calcite? I have looked at
>> current Calcite rules; JoinToMultiJoinRule, LoptOptimizeJoinRule, but they
>> don't quite fit my use case.
>> Furthermore, if it is not supported, how would one go about implementing
>> such a split? I have looked at a rules-based implementation using the
>> hep-planner. But I am having issues with how to translate the RexInputRef's
>> indexes to the "right" place, as I need the indexes to always point to the
>> new joining table rows.
>> 
>> 

Reply via email to