Hi Zoi,
in Calcite rules transform a valid (sub-)plan into one or more other valid
plans, there are no rules "forbidding" anything that is a legal plan, they
just generate alternatives for the AND-OR graph.

As I suggested a few emails before, your best shot to understand what's
missing to get rid of the last cartesian product, is to enable extra
debugging and see at what step the sought transformation should/could
happen, and debug to see why it's not the case.

Extra logging helps to get a hold of the specific rule application number
or id of the object, as rules get applied recursively many times, so you
need a conditional breakpoint on one of the two aforementioned conditions
to be able to reach the relevant rule application.

Best regards,
Alessandro

On Mon, 31 Mar 2025 at 12:50, Zoi Kaoudi <zkao...@apache.org> wrote:

> Hi all,
>
> jumping into the conversation, isn't there a calcite rule that forbids
> cartesian products?
>
> Best
> --
> Zoi
>
> On 2025/03/31 07:46:26 Juri Petersen wrote:
> > Hi,
> > I tried applying the following rules for my example query:
> >
> >         final RuleSet wayangRules = RuleSets.ofList(
> >                 CoreRules.FILTER_INTO_JOIN,
> >                 CoreRules.MULTI_JOIN_OPTIMIZE_BUSHY,
> >                 CoreRules.JOIN_COMMUTE,
> >                 CoreRules.JOIN_ASSOCIATE
> >        );
> >
> > The input tree looks like this:
> >
> > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> russian_movie=[MIN($1)])
> >   LogicalProject(name=[$1], title=[$31])
> >     LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11,
> '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005), =($30,
> $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25), =($21,
> $26))])
> >       LogicalJoin(condition=[true], joinType=[inner])
> >         LogicalJoin(condition=[true], joinType=[inner])
> >           LogicalJoin(condition=[true], joinType=[inner])
> >             LogicalJoin(condition=[true], joinType=[inner])
> >               LogicalJoin(condition=[true], joinType=[inner])
> >                 LogicalJoin(condition=[true], joinType=[inner])
> >                   LogicalTableScan(table=[[postgres, char_name]])
> >                   LogicalTableScan(table=[[postgres, cast_info]])
> >                 LogicalTableScan(table=[[postgres, company_name]])
> >               LogicalTableScan(table=[[postgres, company_type]])
> >             LogicalTableScan(table=[[postgres, movie_companies]])
> >           LogicalTableScan(table=[[postgres, role_type]])
> >         LogicalTableScan(table=[[postgres, title]])
> >
> > The resulting converted tree is close to what I desire, however one
> multi-condition join can't be pushed down, leading to a tree with on
> cartesian product remaining:
> >
> > WayangAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> russian_movie=[MIN($1)])
> >   WayangProject(name=[$1], title=[$31])
> >     WayangJoin(condition=[AND(=($24, $30), =($28, $13))],
> joinType=[inner])
> >       WayangJoin(condition=[=($9, $24)], joinType=[inner])
> >         WayangJoin(condition=[=($0, $10)], joinType=[inner])
> >           WayangTableScan(table=[[postgres, char_name]])
> >           WayangFilter(condition=[AND(LIKE($4, '%(voice)%'), LIKE($4,
> '%(uncredited)%'))])
> >             WayangTableScan(table=[[postgres, cast_info]])
> >         WayangJoin(condition=[=($0, $11)], joinType=[inner])
> >           WayangFilter(condition=[=($2, '[ru]')])
> >             WayangTableScan(table=[[postgres, company_name]])
> >           WayangJoin(condition=[=($0, $5)], joinType=[inner])
> >             WayangTableScan(table=[[postgres, company_type]])
> >             WayangTableScan(table=[[postgres, movie_companies]])
> >       WayangJoin(condition=[true], joinType=[inner])
> >         WayangFilter(condition=[=($1, 'actor')])
> >           WayangTableScan(table=[[postgres, role_type]])
> >         WayangFilter(condition=[>($4, 2005)])
> >           WayangTableScan(table=[[postgres, title]])
> >
> >
> > Looking at my input query, the role_type table has a specified join
> condition on cast_info.
> > Am I missing a detail that prevents me from being able to deconstruct
> the multi-conditional join here?
> >
> > Any help would be greatly appreciated.
> >
> > Best,
> > Juri
> >
> > On 2025/03/27 10:57:26 Dong Silun wrote:
> > > Hi Juri,
> > > As Alessandro said, the Join order prevents the predicates from being
> pushed down to the ideal position.
> > > You can try to use the two rules CoreRules.JOIN_COMMUTE and
> CoreRules.JOIN_ASSOCIATE instead of the heuristic/dp join reorder
> algorithm. In the case of all inner joins, CoreRules.JOIN_COMMUTE and
> CoreRules.JOIN_ASSOCIATE will generate all join order possibilities (when
> using VolcanoPlanner), so as to get the join order that can smoothly push
> all predicates down to the ideal position (combined with the FilterIntoJoin
> rule).
> > > However, the optimization process may be time-consuming because there
> are a total of 7 tables involved in join and the commutative and
> associative rules are used to enumerate every possibility.
> > > I didn't actually run your example, I just provided an idea, I hope it
> can help you.
> > >
> > > Best,
> > > Silun
> > >
> > > ________________________________
> > > 发件人: Juri Petersen <j...@apache.org>
> > > 发送时间: 2025年3月27日 16:40
> > > 收件人: dev@calcite.apache.org <dev@calcite.apache.org>
> > > 主题: Re: FIlterIntoJoinRule applied without complete result
> > >
> > > Hi,
> > > Thank you for your answer!
> > > I see your point about join ordering, thats also why I tried using the
> MULTI_JOIN_OPTIMZE CoreRule before.
> > > I tried it again just now, and these rules still don't resolve my
> problem:
> > >
> > >         final RuleSet rules = RuleSets.ofList(
> > >                 CoreRules.FILTER_INTO_JOIN,
> > >                 CoreRules.MULTI_JOIN_OPTIMIZE
> > >         );
> > >
> > > I tried both the smart and dumb FILTER_INTO_JOIN and also the bushy
> version of MULTI_JOIN_OPTIMIZE.
> > >
> > > The message I get when trying to optimize the plan is the following:
> > >
> > > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are
> not enough rules to produce a node with desired properties:
> convention=NONE. All the inputs have relevant nodes, however the cost is
> still infinite.
> > > Root: rel#55:RelSubset#15.NONE
> > > Original rel:
> > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> russian_movie=[MIN($1)]): rowcount = 1.0, cumulative cost =
> 1.0101010125097225E14, id = 30
> > >   LogicalProject(name=[$1], title=[$31]): rowcount =
> 120135.49804687499, cumulative cost = 1.01010101250971E14, id = 29
> > >     LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11,
> '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005), =($30,
> $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25), =($21,
> $26))]): rowcount = 120135.49804687499, cumulative cost =
> 1.010101011308355E14, id = 26
> > >       LogicalJoin(condition=[true], joinType=[inner]): rowcount =
> 1.0E14, cumulative cost = 1.010101010107E14, id = 25
> > >         LogicalJoin(condition=[true], joinType=[inner]): rowcount =
> 1.0E12, cumulative cost = 1.0101010106E12, id = 21
> > >           LogicalJoin(condition=[true], joinType=[inner]): rowcount =
> 1.0E10, cumulative cost = 1.01010105E10, id = 17
> > >             LogicalJoin(condition=[true], joinType=[inner]): rowcount
> = 1.0E8, cumulative cost = 1.010104E8, id = 13
> > >               LogicalJoin(condition=[true], joinType=[inner]):
> rowcount = 1000000.0, cumulative cost = 1010300.0, id = 9
> > >                 LogicalJoin(condition=[true], joinType=[inner]):
> rowcount = 10000.0, cumulative cost = 10200.0, id = 5
> > >                   LogicalTableScan(table=[[postgres, char_name]]):
> rowcount = 100.0, cumulative cost = 100.0, id = 1
> > >                   LogicalTableScan(table=[[postgres, cast_info]]):
> rowcount = 100.0, cumulative cost = 100.0, id = 3
> > >                 LogicalTableScan(table=[[postgres, company_name]]):
> rowcount = 100.0, cumulative cost = 100.0, id = 7
> > >               LogicalTableScan(table=[[postgres, company_type]]):
> rowcount = 100.0, cumulative cost = 100.0, id = 11
> > >             LogicalTableScan(table=[[postgres, movie_companies]]):
> rowcount = 100.0, cumulative cost = 100.0, id = 15
> > >           LogicalTableScan(table=[[postgres, role_type]]): rowcount =
> 100.0, cumulative cost = 100.0, id = 19
> > >         LogicalTableScan(table=[[postgres, title]]): rowcount = 100.0,
> cumulative cost = 100.0, id = 23
> > >
> > >
> > > I hope this specifies my problem a bit more.
> > >
> > > Best,
> > > Juri
> > >
> > > On 2025/03/26 13:54:33 Alessandro Solimando wrote:
> > > > Hi Juri,
> > > > it's true that the tables in the joins are fully connected via the
> > > > predicates, but order matters and the concrete order I see can't do
> without
> > > > cartesian products: it's joining "company_type" with other tables
> before
> > > > joining with "movie_companies", but the only predicate in the where
> clause
> > > > around "company_type" is "ct.id = mc.company_type_id", which can't
> be used
> > > > in that subtree as "movie_companies" hasn't been joined yet, so
> basically
> > > > it's a join ordering "issue" (which could not be an issue at all
> based on
> > > > the size of the tables, selectivity of the predicates etc.).
> > > >
> > > > Are you using rules for join ordering like LoptOptimizeJoinRule
> > > > <
> https://github.com/apache/calcite/blob/bfbe8930f4ed7ba8da530e862e212a057191cfa3/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java
> >
> > > > in your program (the set of rules you use could help people provide a
> > > > better answer)? If you are using 1.39.0 there is a new join ordering
> > > > algorithm, you can refer to CALCITE-6846
> > > > <https://issues.apache.org/jira/browse/CALCITE-6846> and related PR
> for
> > > > more details which should be exhaustive.
> > > >
> > > > If you think you have added all the rules and you can't still get a
> sense
> > > > of why you end up with a particular plan, you can activate the
> extended
> > > > logs around rule applications and transformations to be able to then
> put
> > > > breakpoints in the involved rules at the specific step which is
> generally
> > > > tricky as rules are called multiple times. You can refer to these
> slides
> > > >
> https://www.slideshare.net/StamatisZampetakis/debugging-planning-issues-using-calcites-builtin-loggers
> > > > (there is also the full video and other links at
> > > > https://calcite.apache.org/community/, the talk is "Debugging
> planning
> > > > issues using Calcite’s built in loggers").
> > > >
> > > > Best regards,
> > > > Alessandro
> > > >
> > > > On Wed, 26 Mar 2025 at 11:10, Juri Petersen <j...@itu.dk.invalid>
> wrote:
> > > >
> > > > > Hi,
> > > > > As mentioned by Mads in a previous mail, we are working on a
> SQL-API in
> > > > > Apache Wayang.
> > > > > We are trying to set up experiments with the JOB Benchmark and see
> that we
> > > > > have to rewrite queries to explicit INNER JOINS for them to be
> parsed
> > > > > correctly.
> > > > > Since we are planning to do other benchmarks with thousands of
> queries,
> > > > > rewriting is not feasible.
> > > > >
> > > > > Given this (not-rewritten) query from JOB:
> > > > >
> > > > > SELECT MIN(chn.name) AS uncredited_voiced_character,
> > > > >        MIN(t.title) AS russian_movie
> > > > > FROM postgres.char_name AS chn,
> > > > >      postgres.cast_info AS ci,
> > > > >      postgres.company_name AS cn,
> > > > >      postgres.company_type AS ct,
> > > > >      postgres.movie_companies AS mc,
> > > > >      postgres.role_type AS rt,
> > > > >      postgres.title AS t
> > > > > WHERE ci.note LIKE '%(voice)%'
> > > > >   AND ci.note LIKE '%(uncredited)%'
> > > > >   AND cn.country_code = '[ru]'
> > > > >   AND rt.role = 'actor'
> > > > >   AND t.production_year > 2005
> > > > >   AND t.id = mc.movie_id
> > > > >   AND t.id = ci.movie_id
> > > > >   AND ci.movie_id = mc.movie_id
> > > > >   AND chn.id = ci.person_role_id
> > > > >   AND rt.id = ci.role_id
> > > > >   AND cn.id = mc.company_id
> > > > >   AND ct.id = mc.company_type_id;
> > > > >
> > > > > We use calcite to get the following tree:
> > > > >
> > > > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> > > > > russian_movie=[MIN($1)])
> > > > >   LogicalProject(name=[$1], title=[$31])
> > > > >     LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11,
> > > > > '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005),
> =($30,
> > > > > $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14,
> $25), =($21,
> > > > > $26))])
> > > > >       LogicalJoin(condition=[true], joinType=[inner])
> > > > >         LogicalJoin(condition=[true], joinType=[inner])
> > > > >           LogicalJoin(condition=[true], joinType=[inner])
> > > > >             LogicalJoin(condition=[true], joinType=[inner])
> > > > >               LogicalJoin(condition=[true], joinType=[inner])
> > > > >                 LogicalJoin(condition=[true], joinType=[inner])
> > > > >                   LogicalTableScan(table=[[postgres, char_name]])
> > > > >                   LogicalTableScan(table=[[postgres, cast_info]])
> > > > >                 LogicalTableScan(table=[[postgres, company_name]])
> > > > >               LogicalTableScan(table=[[postgres, company_type]])
> > > > >             LogicalTableScan(table=[[postgres, movie_companies]])
> > > > >           LogicalTableScan(table=[[postgres, role_type]])
> > > > >         LogicalTableScan(table=[[postgres, title]])
> > > > >
> > > > >
> > > > > I then try to apply the CoreRules.FILTER_INTO_JOIN (tried smart
> and dumb
> > > > > version), in order to avoid the cartesian products, hoping to push
> the join
> > > > > conditions into the respective LogicalJoins.
> > > > > Heres the resulting tree:
> > > > >
> > > > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)],
> > > > > russian_movie=[MIN($1)])
> > > > >   LogicalProject(name=[$1], title=[$31])
> > > > >     LogicalJoin(condition=[=($24, $30)], joinType=[inner])
> > > > >       LogicalJoin(condition=[=($28, $13)], joinType=[inner])
> > > > >         LogicalJoin(condition=[AND(=($9, $24), =($14, $25), =($21,
> $26))],
> > > > > joinType=[inner])
> > > > >           LogicalJoin(condition=[true], joinType=[inner])
> > > > >             LogicalJoin(condition=[true], joinType=[inner])
> > > > >               LogicalJoin(condition=[=($0, $10)], joinType=[inner])
> > > > >                 LogicalTableScan(table=[[postgres, char_name]])
> > > > >                 LogicalFilter(condition=[AND(LIKE($4, '%(voice)%'),
> > > > > LIKE($4, '%(uncredited)%'))])
> > > > >                   LogicalTableScan(table=[[postgres, cast_info]])
> > > > >               LogicalFilter(condition=[=($2, '[ru]')])
> > > > >                 LogicalTableScan(table=[[postgres, company_name]])
> > > > >             LogicalTableScan(table=[[postgres, company_type]])
> > > > >           LogicalTableScan(table=[[postgres, movie_companies]])
> > > > >         LogicalFilter(condition=[=($1, 'actor')])
> > > > >           LogicalTableScan(table=[[postgres, role_type]])
> > > > >       LogicalFilter(condition=[>($4, 2005)])
> > > > >         LogicalTableScan(table=[[postgres, title]])
> > > > >
> > > > > Some of the conditions are pushed down, but we still have remaining
> > > > > cartesian products and a multi-condition join.
> > > > > Looking at the input query, I would expect every Join to have a
> condition,
> > > > > as there are no unspecified joins, right?
> > > > > What am I missing or what can we do to deconstruct the
> multi-conditional
> > > > > join and avoid cartesian products?
> > > > >
> > > > > Thanks in advance for any help!
> > > > >
> > > > > Best,
> > > > > Juri
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to