Hi Mads,

Currently, the sql-api, converts logical plans by calcite to wayang
(relational) plans as-is. For what you want to achieve, you will have to
extend the WayangJoinVisitor.java class. Particularly,
https://github.com/apache/incubator-wayang/blob/24d8eec742f21fd5adfbd089e93f96271c6f5a63/wayang-api/wayang-api-sql/src/main/java/org/apache/wayang/api/sql/calcite/converter/WayangJoinVisitor.java#L49

It only supports equi-joins for now; but one could extend it to support
other predicates as you require.

Hope this helps.

Best,

Kaustubh


On Fri, Mar 14, 2025 at 2:11 PM Mads Sejer Pedersen <s...@itu.dk.invalid>
wrote:

> Hi people,
>
> I am doing some benchmarking with Calcite for the sql-api in Apache Wayang
> that requires typically multiconditional joins to be split into "binary"
> joins ala:
> LogicalJoin(condition=[AND(=($0, $27), =($10, $28), =($34, $2))],
> joinType=[inner]): rowcount = 118.65234375, cumulative cost = 1038.96484375
>                 LogicalJoin(condition=[=($0, $11)], joinType=[inner]):
> rowcount = 351.5625, cumulative cost = 820.3125
>                   LogicalJoin(condition=[=($0, $3)], joinType=[inner]):
> rowcount = 93.75, cumulative cost = 343.75
>                     LogicalFilter(condition=[SEARCH($1,
> Sarg['cs':CHAR(11), 'gaming':CHAR(11), 'mathematica']:CHAR(11))]): rowcount
> = 25.0, cumulative cost = 125.0
>                       LogicalTableScan(table=[[postgres, site]]): rowcount
> = 100.0, cumulative cost = 100.0
>                     LogicalFilter(condition=[SEARCH($6,
> Sarg[[10..100000]])]): rowcount = 25.0, cumulative cost = 125.0
>                       LogicalTableScan(table=[[postgres, so_user]]):
> rowcount = 100.0, cumulative cost = 100.0
>                   LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]):
> rowcount = 25.0, cumulative cost = 125.0
>                     LogicalTableScan(table=[[postgres, question]]):
> rowcount = 100.0, cumulative cost = 100.0
>                 LogicalTableScan(table=[[postgres, answer]]): rowcount =
> 100.0, cumulative cost = 100.0
>
>
> BinaryJoin(condition=[=($60, $2)], joinType=[inner])
>   BinaryJoin(condition=[=($10, $41)], joinType=[inner])
>     BinaryJoin(condition=[=($0, $27)], joinType=[inner])
>       LogicalJoin(condition=[=($0, $11)], joinType=[inner])
>         LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>           LogicalFilter(condition=[SEARCH($1, Sarg['cs':CHAR(11),
> 'gaming':CHAR(11), 'mathematica']:CHAR(11))])
>             LogicalTableScan(table=[[postgres, site]])
>           LogicalFilter(condition=[SEARCH($6, Sarg[[10..100000]])])
>             LogicalTableScan(table=[[postgres, so_user]])
>         LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])])
>           LogicalTableScan(table=[[postgres, question]])
>       LogicalTableScan(table=[[postgres, answer]])
>     LogicalTableScan(table=[[postgres, answer]])
>   LogicalTableScan(table=[[postgres, answer]])
>
> Does anyone know of a Calcite rule that already does something like this,
> or have a general idea about how such a thing would be implemented? I tried
> using the hep-planner with a rules-based approach, but there are some
> issues with how Wayang handles join inputs i.e. left and right, and Calcite
> handles inputs - Calcite uses more a crosstype based on both the rows of
> the left and right input. Thanks
>
>

Reply via email to