Hi Itiel,
That is a very good point, thank you!

Best,
Juri

On 2025/04/01 15:16:55 Itiel Sadeh wrote:
> Hi Juri,
> 
> Regarding your solution, I think it will fail for you in case of queries
> that must have cross join, such as "select * from t1 join t2", so I will be
> cautious with that. Perhaps you can use "makeHugeCost()" instead of
> "makeInfiniteCost()".
> 
> 
> On Tue, Apr 1, 2025 at 5:00 PM Juri Petersen <j...@apache.org> wrote:
> 
> > For anyone reading this in the future, here is our approach to a
> > "solution":
> >
> > If you want to discourage the optimizer from generating plans with
> > cartesian products and multi-conditional joins, a trivial way is to set a
> > high cost in your custom Join node when matched:
> >
> >         if (condition.isAlwaysTrue()) {
> >             RelOptCostFactory costFactory = planner.getCostFactory();
> >
> >             return costFactory.makeInfiniteCost();
> >         }
> >
> >         if (condition.isA(SqlKind.AND)) {
> >             RelOptCostFactory costFactory = planner.getCostFactory();
> >
> >             return costFactory.makeInfiniteCost();
> >         }
> >
> > This resolved the problems for us.
> >
> > Best,
> > Juri
> >
> >
> > On 2025/03/31 11:00:58 Alessandro Solimando wrote:
> > > Hi Zoi,
> > > in Calcite rules transform a valid (sub-)plan into one or more other
> > valid
> > > plans, there are no rules "forbidding" anything that is a legal plan,
> > they
> > > just generate alternatives for the AND-OR graph.
> > >
> > > As I suggested a few emails before, your best shot to understand what's
> > > missing to get rid of the last cartesian product, is to enable extra
> > > debugging and see at what step the sought transformation should/could
> > > happen, and debug to see why it's not the case.
> > >
> > > Extra logging helps to get a hold of the specific rule application number
> > > or id of the object, as rules get applied recursively many times, so you
> > > need a conditional breakpoint on one of the two aforementioned conditions
> > > to be able to reach the relevant rule application.
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > On Mon, 31 Mar 2025 at 12:50, Zoi Kaoudi <zkao...@apache.org> wrote:
> > >
> > > > Hi all,
> > > >
> > > > jumping into the conversation, isn't there a calcite rule that forbids
> > > > cartesian products?
> > > >
> > > > Best
> > > > --
> > > > Zoi
> > > >
> > > > On 2025/03/31 07:46:26 Juri Petersen wrote:
> > > > > Hi,
> > > > > I tried applying the following rules for my example query:
> > > > >
> > > > >         final RuleSet wayangRules = RuleSets.ofList(
> > > > >                 CoreRules.FILTER_INTO_JOIN,
> > > > >                 CoreRules.MULTI_JOIN_OPTIMIZE_BUSHY,
> > > > >                 CoreRules.JOIN_COMMUTE,
> > > > >                 CoreRules.JOIN_ASSOCIATE
> > > > >        );
> > > > >
> > > > > The input tree looks like this:
> > > > >
> > > > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> > > > russian_movie=[MIN($1)])
> > > > >   LogicalProject(name=[$1], title=[$31])
> > > > >     LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11,
> > > > '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005),
> > =($30,
> > > > $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25),
> > =($21,
> > > > $26))])
> > > > >       LogicalJoin(condition=[true], joinType=[inner])
> > > > >         LogicalJoin(condition=[true], joinType=[inner])
> > > > >           LogicalJoin(condition=[true], joinType=[inner])
> > > > >             LogicalJoin(condition=[true], joinType=[inner])
> > > > >               LogicalJoin(condition=[true], joinType=[inner])
> > > > >                 LogicalJoin(condition=[true], joinType=[inner])
> > > > >                   LogicalTableScan(table=[[postgres, char_name]])
> > > > >                   LogicalTableScan(table=[[postgres, cast_info]])
> > > > >                 LogicalTableScan(table=[[postgres, company_name]])
> > > > >               LogicalTableScan(table=[[postgres, company_type]])
> > > > >             LogicalTableScan(table=[[postgres, movie_companies]])
> > > > >           LogicalTableScan(table=[[postgres, role_type]])
> > > > >         LogicalTableScan(table=[[postgres, title]])
> > > > >
> > > > > The resulting converted tree is close to what I desire, however one
> > > > multi-condition join can't be pushed down, leading to a tree with on
> > > > cartesian product remaining:
> > > > >
> > > > > WayangAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> > > > russian_movie=[MIN($1)])
> > > > >   WayangProject(name=[$1], title=[$31])
> > > > >     WayangJoin(condition=[AND(=($24, $30), =($28, $13))],
> > > > joinType=[inner])
> > > > >       WayangJoin(condition=[=($9, $24)], joinType=[inner])
> > > > >         WayangJoin(condition=[=($0, $10)], joinType=[inner])
> > > > >           WayangTableScan(table=[[postgres, char_name]])
> > > > >           WayangFilter(condition=[AND(LIKE($4, '%(voice)%'), LIKE($4,
> > > > '%(uncredited)%'))])
> > > > >             WayangTableScan(table=[[postgres, cast_info]])
> > > > >         WayangJoin(condition=[=($0, $11)], joinType=[inner])
> > > > >           WayangFilter(condition=[=($2, '[ru]')])
> > > > >             WayangTableScan(table=[[postgres, company_name]])
> > > > >           WayangJoin(condition=[=($0, $5)], joinType=[inner])
> > > > >             WayangTableScan(table=[[postgres, company_type]])
> > > > >             WayangTableScan(table=[[postgres, movie_companies]])
> > > > >       WayangJoin(condition=[true], joinType=[inner])
> > > > >         WayangFilter(condition=[=($1, 'actor')])
> > > > >           WayangTableScan(table=[[postgres, role_type]])
> > > > >         WayangFilter(condition=[>($4, 2005)])
> > > > >           WayangTableScan(table=[[postgres, title]])
> > > > >
> > > > >
> > > > > Looking at my input query, the role_type table has a specified join
> > > > condition on cast_info.
> > > > > Am I missing a detail that prevents me from being able to deconstruct
> > > > the multi-conditional join here?
> > > > >
> > > > > Any help would be greatly appreciated.
> > > > >
> > > > > Best,
> > > > > Juri
> > > > >
> > > > > On 2025/03/27 10:57:26 Dong Silun wrote:
> > > > > > Hi Juri,
> > > > > > As Alessandro said, the Join order prevents the predicates from
> > being
> > > > pushed down to the ideal position.
> > > > > > You can try to use the two rules CoreRules.JOIN_COMMUTE and
> > > > CoreRules.JOIN_ASSOCIATE instead of the heuristic/dp join reorder
> > > > algorithm. In the case of all inner joins, CoreRules.JOIN_COMMUTE and
> > > > CoreRules.JOIN_ASSOCIATE will generate all join order possibilities
> > (when
> > > > using VolcanoPlanner), so as to get the join order that can smoothly
> > push
> > > > all predicates down to the ideal position (combined with the
> > FilterIntoJoin
> > > > rule).
> > > > > > However, the optimization process may be time-consuming because
> > there
> > > > are a total of 7 tables involved in join and the commutative and
> > > > associative rules are used to enumerate every possibility.
> > > > > > I didn't actually run your example, I just provided an idea, I
> > hope it
> > > > can help you.
> > > > > >
> > > > > > Best,
> > > > > > Silun
> > > > > >
> > > > > > ________________________________
> > > > > > 发件人: Juri Petersen <j...@apache.org>
> > > > > > 发送时间: 2025年3月27日 16:40
> > > > > > 收件人: dev@calcite.apache.org <dev@calcite.apache.org>
> > > > > > 主题: Re: FIlterIntoJoinRule applied without complete result
> > > > > >
> > > > > > Hi,
> > > > > > Thank you for your answer!
> > > > > > I see your point about join ordering, thats also why I tried using
> > the
> > > > MULTI_JOIN_OPTIMZE CoreRule before.
> > > > > > I tried it again just now, and these rules still don't resolve my
> > > > problem:
> > > > > >
> > > > > >         final RuleSet rules = RuleSets.ofList(
> > > > > >                 CoreRules.FILTER_INTO_JOIN,
> > > > > >                 CoreRules.MULTI_JOIN_OPTIMIZE
> > > > > >         );
> > > > > >
> > > > > > I tried both the smart and dumb FILTER_INTO_JOIN and also the bushy
> > > > version of MULTI_JOIN_OPTIMIZE.
> > > > > >
> > > > > > The message I get when trying to optimize the plan is the
> > following:
> > > > > >
> > > > > > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There
> > are
> > > > not enough rules to produce a node with desired properties:
> > > > convention=NONE. All the inputs have relevant nodes, however the cost
> > is
> > > > still infinite.
> > > > > > Root: rel#55:RelSubset#15.NONE
> > > > > > Original rel:
> > > > > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> > > > russian_movie=[MIN($1)]): rowcount = 1.0, cumulative cost =
> > > > 1.0101010125097225E14, id = 30
> > > > > >   LogicalProject(name=[$1], title=[$31]): rowcount =
> > > > 120135.49804687499, cumulative cost = 1.01010101250971E14, id = 29
> > > > > >     LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11,
> > > > '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005),
> > =($30,
> > > > $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25),
> > =($21,
> > > > $26))]): rowcount = 120135.49804687499, cumulative cost =
> > > > 1.010101011308355E14, id = 26
> > > > > >       LogicalJoin(condition=[true], joinType=[inner]): rowcount =
> > > > 1.0E14, cumulative cost = 1.010101010107E14, id = 25
> > > > > >         LogicalJoin(condition=[true], joinType=[inner]): rowcount =
> > > > 1.0E12, cumulative cost = 1.0101010106E12, id = 21
> > > > > >           LogicalJoin(condition=[true], joinType=[inner]):
> > rowcount =
> > > > 1.0E10, cumulative cost = 1.01010105E10, id = 17
> > > > > >             LogicalJoin(condition=[true], joinType=[inner]):
> > rowcount
> > > > = 1.0E8, cumulative cost = 1.010104E8, id = 13
> > > > > >               LogicalJoin(condition=[true], joinType=[inner]):
> > > > rowcount = 1000000.0, cumulative cost = 1010300.0, id = 9
> > > > > >                 LogicalJoin(condition=[true], joinType=[inner]):
> > > > rowcount = 10000.0, cumulative cost = 10200.0, id = 5
> > > > > >                   LogicalTableScan(table=[[postgres, char_name]]):
> > > > rowcount = 100.0, cumulative cost = 100.0, id = 1
> > > > > >                   LogicalTableScan(table=[[postgres, cast_info]]):
> > > > rowcount = 100.0, cumulative cost = 100.0, id = 3
> > > > > >                 LogicalTableScan(table=[[postgres, company_name]]):
> > > > rowcount = 100.0, cumulative cost = 100.0, id = 7
> > > > > >               LogicalTableScan(table=[[postgres, company_type]]):
> > > > rowcount = 100.0, cumulative cost = 100.0, id = 11
> > > > > >             LogicalTableScan(table=[[postgres, movie_companies]]):
> > > > rowcount = 100.0, cumulative cost = 100.0, id = 15
> > > > > >           LogicalTableScan(table=[[postgres, role_type]]):
> > rowcount =
> > > > 100.0, cumulative cost = 100.0, id = 19
> > > > > >         LogicalTableScan(table=[[postgres, title]]): rowcount =
> > 100.0,
> > > > cumulative cost = 100.0, id = 23
> > > > > >
> > > > > >
> > > > > > I hope this specifies my problem a bit more.
> > > > > >
> > > > > > Best,
> > > > > > Juri
> > > > > >
> > > > > > On 2025/03/26 13:54:33 Alessandro Solimando wrote:
> > > > > > > Hi Juri,
> > > > > > > it's true that the tables in the joins are fully connected via
> > the
> > > > > > > predicates, but order matters and the concrete order I see can't
> > do
> > > > without
> > > > > > > cartesian products: it's joining "company_type" with other tables
> > > > before
> > > > > > > joining with "movie_companies", but the only predicate in the
> > where
> > > > clause
> > > > > > > around "company_type" is "ct.id = mc.company_type_id", which
> > can't
> > > > be used
> > > > > > > in that subtree as "movie_companies" hasn't been joined yet, so
> > > > basically
> > > > > > > it's a join ordering "issue" (which could not be an issue at all
> > > > based on
> > > > > > > the size of the tables, selectivity of the predicates etc.).
> > > > > > >
> > > > > > > Are you using rules for join ordering like LoptOptimizeJoinRule
> > > > > > > <
> > > >
> > https://github.com/apache/calcite/blob/bfbe8930f4ed7ba8da530e862e212a057191cfa3/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java
> > > > >
> > > > > > > in your program (the set of rules you use could help people
> > provide a
> > > > > > > better answer)? If you are using 1.39.0 there is a new join
> > ordering
> > > > > > > algorithm, you can refer to CALCITE-6846
> > > > > > > <https://issues.apache.org/jira/browse/CALCITE-6846> and
> > related PR
> > > > for
> > > > > > > more details which should be exhaustive.
> > > > > > >
> > > > > > > If you think you have added all the rules and you can't still
> > get a
> > > > sense
> > > > > > > of why you end up with a particular plan, you can activate the
> > > > extended
> > > > > > > logs around rule applications and transformations to be able to
> > then
> > > > put
> > > > > > > breakpoints in the involved rules at the specific step which is
> > > > generally
> > > > > > > tricky as rules are called multiple times. You can refer to these
> > > > slides
> > > > > > >
> > > >
> > https://www.slideshare.net/StamatisZampetakis/debugging-planning-issues-using-calcites-builtin-loggers
> > > > > > > (there is also the full video and other links at
> > > > > > > https://calcite.apache.org/community/, the talk is "Debugging
> > > > planning
> > > > > > > issues using Calcite’s built in loggers").
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alessandro
> > > > > > >
> > > > > > > On Wed, 26 Mar 2025 at 11:10, Juri Petersen <j...@itu.dk.invalid
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > As mentioned by Mads in a previous mail, we are working on a
> > > > SQL-API in
> > > > > > > > Apache Wayang.
> > > > > > > > We are trying to set up experiments with the JOB Benchmark and
> > see
> > > > that we
> > > > > > > > have to rewrite queries to explicit INNER JOINS for them to be
> > > > parsed
> > > > > > > > correctly.
> > > > > > > > Since we are planning to do other benchmarks with thousands of
> > > > queries,
> > > > > > > > rewriting is not feasible.
> > > > > > > >
> > > > > > > > Given this (not-rewritten) query from JOB:
> > > > > > > >
> > > > > > > > SELECT MIN(chn.name) AS uncredited_voiced_character,
> > > > > > > >        MIN(t.title) AS russian_movie
> > > > > > > > FROM postgres.char_name AS chn,
> > > > > > > >      postgres.cast_info AS ci,
> > > > > > > >      postgres.company_name AS cn,
> > > > > > > >      postgres.company_type AS ct,
> > > > > > > >      postgres.movie_companies AS mc,
> > > > > > > >      postgres.role_type AS rt,
> > > > > > > >      postgres.title AS t
> > > > > > > > WHERE ci.note LIKE '%(voice)%'
> > > > > > > >   AND ci.note LIKE '%(uncredited)%'
> > > > > > > >   AND cn.country_code = '[ru]'
> > > > > > > >   AND rt.role = 'actor'
> > > > > > > >   AND t.production_year > 2005
> > > > > > > >   AND t.id = mc.movie_id
> > > > > > > >   AND t.id = ci.movie_id
> > > > > > > >   AND ci.movie_id = mc.movie_id
> > > > > > > >   AND chn.id = ci.person_role_id
> > > > > > > >   AND rt.id = ci.role_id
> > > > > > > >   AND cn.id = mc.company_id
> > > > > > > >   AND ct.id = mc.company_type_id;
> > > > > > > >
> > > > > > > > We use calcite to get the following tree:
> > > > > > > >
> > > > > > > > LogicalAggregate(group=[{}],
> > uncredited_voiced_character=[MIN($0)],
> > > > > > > > russian_movie=[MIN($1)])
> > > > > > > >   LogicalProject(name=[$1], title=[$31])
> > > > > > > >     LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'),
> > LIKE($11,
> > > > > > > > '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34,
> > 2005),
> > > > =($30,
> > > > > > > > $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14,
> > > > $25), =($21,
> > > > > > > > $26))])
> > > > > > > >       LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >         LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >           LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >             LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >               LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >                 LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >                   LogicalTableScan(table=[[postgres,
> > char_name]])
> > > > > > > >                   LogicalTableScan(table=[[postgres,
> > cast_info]])
> > > > > > > >                 LogicalTableScan(table=[[postgres,
> > company_name]])
> > > > > > > >               LogicalTableScan(table=[[postgres,
> > company_type]])
> > > > > > > >             LogicalTableScan(table=[[postgres,
> > movie_companies]])
> > > > > > > >           LogicalTableScan(table=[[postgres, role_type]])
> > > > > > > >         LogicalTableScan(table=[[postgres, title]])
> > > > > > > >
> > > > > > > >
> > > > > > > > I then try to apply the CoreRules.FILTER_INTO_JOIN (tried smart
> > > > and dumb
> > > > > > > > version), in order to avoid the cartesian products, hoping to
> > push
> > > > the join
> > > > > > > > conditions into the respective LogicalJoins.
> > > > > > > > Heres the resulting tree:
> > > > > > > >
> > > > > > > > LogicalAggregate(group=[{}],
> > uncredited_voiced_character=[MIN($0)],
> > > > > > > > russian_movie=[MIN($1)])
> > > > > > > >   LogicalProject(name=[$1], title=[$31])
> > > > > > > >     LogicalJoin(condition=[=($24, $30)], joinType=[inner])
> > > > > > > >       LogicalJoin(condition=[=($28, $13)], joinType=[inner])
> > > > > > > >         LogicalJoin(condition=[AND(=($9, $24), =($14, $25),
> > =($21,
> > > > $26))],
> > > > > > > > joinType=[inner])
> > > > > > > >           LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >             LogicalJoin(condition=[true], joinType=[inner])
> > > > > > > >               LogicalJoin(condition=[=($0, $10)],
> > joinType=[inner])
> > > > > > > >                 LogicalTableScan(table=[[postgres, char_name]])
> > > > > > > >                 LogicalFilter(condition=[AND(LIKE($4,
> > '%(voice)%'),
> > > > > > > > LIKE($4, '%(uncredited)%'))])
> > > > > > > >                   LogicalTableScan(table=[[postgres,
> > cast_info]])
> > > > > > > >               LogicalFilter(condition=[=($2, '[ru]')])
> > > > > > > >                 LogicalTableScan(table=[[postgres,
> > company_name]])
> > > > > > > >             LogicalTableScan(table=[[postgres, company_type]])
> > > > > > > >           LogicalTableScan(table=[[postgres, movie_companies]])
> > > > > > > >         LogicalFilter(condition=[=($1, 'actor')])
> > > > > > > >           LogicalTableScan(table=[[postgres, role_type]])
> > > > > > > >       LogicalFilter(condition=[>($4, 2005)])
> > > > > > > >         LogicalTableScan(table=[[postgres, title]])
> > > > > > > >
> > > > > > > > Some of the conditions are pushed down, but we still have
> > remaining
> > > > > > > > cartesian products and a multi-condition join.
> > > > > > > > Looking at the input query, I would expect every Join to have a
> > > > condition,
> > > > > > > > as there are no unspecified joins, right?
> > > > > > > > What am I missing or what can we do to deconstruct the
> > > > multi-conditional
> > > > > > > > join and avoid cartesian products?
> > > > > > > >
> > > > > > > > Thanks in advance for any help!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Juri
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Reply via email to