Hi, Thank you for your answer! I see your point about join ordering, thats also why I tried using the MULTI_JOIN_OPTIMZE CoreRule before. I tried it again just now, and these rules still don't resolve my problem:
final RuleSet rules = RuleSets.ofList( CoreRules.FILTER_INTO_JOIN, CoreRules.MULTI_JOIN_OPTIMIZE ); I tried both the smart and dumb FILTER_INTO_JOIN and also the bushy version of MULTI_JOIN_OPTIMIZE. The message I get when trying to optimize the plan is the following: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=NONE. All the inputs have relevant nodes, however the cost is still infinite. Root: rel#55:RelSubset#15.NONE Original rel: LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)], russian_movie=[MIN($1)]): rowcount = 1.0, cumulative cost = 1.0101010125097225E14, id = 30 LogicalProject(name=[$1], title=[$31]): rowcount = 120135.49804687499, cumulative cost = 1.01010101250971E14, id = 29 LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11, '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005), =($30, $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25), =($21, $26))]): rowcount = 120135.49804687499, cumulative cost = 1.010101011308355E14, id = 26 LogicalJoin(condition=[true], joinType=[inner]): rowcount = 1.0E14, cumulative cost = 1.010101010107E14, id = 25 LogicalJoin(condition=[true], joinType=[inner]): rowcount = 1.0E12, cumulative cost = 1.0101010106E12, id = 21 LogicalJoin(condition=[true], joinType=[inner]): rowcount = 1.0E10, cumulative cost = 1.01010105E10, id = 17 LogicalJoin(condition=[true], joinType=[inner]): rowcount = 1.0E8, cumulative cost = 1.010104E8, id = 13 LogicalJoin(condition=[true], joinType=[inner]): rowcount = 1000000.0, cumulative cost = 1010300.0, id = 9 LogicalJoin(condition=[true], joinType=[inner]): rowcount = 10000.0, cumulative cost = 10200.0, id = 5 LogicalTableScan(table=[[postgres, char_name]]): rowcount = 100.0, cumulative cost = 100.0, id = 1 LogicalTableScan(table=[[postgres, cast_info]]): rowcount = 100.0, cumulative cost = 100.0, id = 3 LogicalTableScan(table=[[postgres, company_name]]): rowcount = 100.0, cumulative cost = 100.0, id = 7 LogicalTableScan(table=[[postgres, company_type]]): rowcount = 100.0, cumulative cost = 100.0, id = 11 LogicalTableScan(table=[[postgres, movie_companies]]): rowcount = 100.0, cumulative cost = 100.0, id = 15 LogicalTableScan(table=[[postgres, role_type]]): rowcount = 100.0, cumulative cost = 100.0, id = 19 LogicalTableScan(table=[[postgres, title]]): rowcount = 100.0, cumulative cost = 100.0, id = 23 I hope this specifies my problem a bit more. Best, Juri On 2025/03/26 13:54:33 Alessandro Solimando wrote: > Hi Juri, > it's true that the tables in the joins are fully connected via the > predicates, but order matters and the concrete order I see can't do without > cartesian products: it's joining "company_type" with other tables before > joining with "movie_companies", but the only predicate in the where clause > around "company_type" is "ct.id = mc.company_type_id", which can't be used > in that subtree as "movie_companies" hasn't been joined yet, so basically > it's a join ordering "issue" (which could not be an issue at all based on > the size of the tables, selectivity of the predicates etc.). > > Are you using rules for join ordering like LoptOptimizeJoinRule > <https://github.com/apache/calcite/blob/bfbe8930f4ed7ba8da530e862e212a057191cfa3/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java> > in your program (the set of rules you use could help people provide a > better answer)? If you are using 1.39.0 there is a new join ordering > algorithm, you can refer to CALCITE-6846 > <https://issues.apache.org/jira/browse/CALCITE-6846> and related PR for > more details which should be exhaustive. > > If you think you have added all the rules and you can't still get a sense > of why you end up with a particular plan, you can activate the extended > logs around rule applications and transformations to be able to then put > breakpoints in the involved rules at the specific step which is generally > tricky as rules are called multiple times. You can refer to these slides > https://www.slideshare.net/StamatisZampetakis/debugging-planning-issues-using-calcites-builtin-loggers > (there is also the full video and other links at > https://calcite.apache.org/community/, the talk is "Debugging planning > issues using Calcite’s built in loggers"). > > Best regards, > Alessandro > > On Wed, 26 Mar 2025 at 11:10, Juri Petersen <j...@itu.dk.invalid> wrote: > > > Hi, > > As mentioned by Mads in a previous mail, we are working on a SQL-API in > > Apache Wayang. > > We are trying to set up experiments with the JOB Benchmark and see that we > > have to rewrite queries to explicit INNER JOINS for them to be parsed > > correctly. > > Since we are planning to do other benchmarks with thousands of queries, > > rewriting is not feasible. > > > > Given this (not-rewritten) query from JOB: > > > > SELECT MIN(chn.name) AS uncredited_voiced_character, > > MIN(t.title) AS russian_movie > > FROM postgres.char_name AS chn, > > postgres.cast_info AS ci, > > postgres.company_name AS cn, > > postgres.company_type AS ct, > > postgres.movie_companies AS mc, > > postgres.role_type AS rt, > > postgres.title AS t > > WHERE ci.note LIKE '%(voice)%' > > AND ci.note LIKE '%(uncredited)%' > > AND cn.country_code = '[ru]' > > AND rt.role = 'actor' > > AND t.production_year > 2005 > > AND t.id = mc.movie_id > > AND t.id = ci.movie_id > > AND ci.movie_id = mc.movie_id > > AND chn.id = ci.person_role_id > > AND rt.id = ci.role_id > > AND cn.id = mc.company_id > > AND ct.id = mc.company_type_id; > > > > We use calcite to get the following tree: > > > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)], > > russian_movie=[MIN($1)]) > > LogicalProject(name=[$1], title=[$31]) > > LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11, > > '%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005), =($30, > > $24), =($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25), =($21, > > $26))]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalTableScan(table=[[postgres, char_name]]) > > LogicalTableScan(table=[[postgres, cast_info]]) > > LogicalTableScan(table=[[postgres, company_name]]) > > LogicalTableScan(table=[[postgres, company_type]]) > > LogicalTableScan(table=[[postgres, movie_companies]]) > > LogicalTableScan(table=[[postgres, role_type]]) > > LogicalTableScan(table=[[postgres, title]]) > > > > > > I then try to apply the CoreRules.FILTER_INTO_JOIN (tried smart and dumb > > version), in order to avoid the cartesian products, hoping to push the join > > conditions into the respective LogicalJoins. > > Heres the resulting tree: > > > > LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)], > > russian_movie=[MIN($1)]) > > LogicalProject(name=[$1], title=[$31]) > > LogicalJoin(condition=[=($24, $30)], joinType=[inner]) > > LogicalJoin(condition=[=($28, $13)], joinType=[inner]) > > LogicalJoin(condition=[AND(=($9, $24), =($14, $25), =($21, $26))], > > joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[true], joinType=[inner]) > > LogicalJoin(condition=[=($0, $10)], joinType=[inner]) > > LogicalTableScan(table=[[postgres, char_name]]) > > LogicalFilter(condition=[AND(LIKE($4, '%(voice)%'), > > LIKE($4, '%(uncredited)%'))]) > > LogicalTableScan(table=[[postgres, cast_info]]) > > LogicalFilter(condition=[=($2, '[ru]')]) > > LogicalTableScan(table=[[postgres, company_name]]) > > LogicalTableScan(table=[[postgres, company_type]]) > > LogicalTableScan(table=[[postgres, movie_companies]]) > > LogicalFilter(condition=[=($1, 'actor')]) > > LogicalTableScan(table=[[postgres, role_type]]) > > LogicalFilter(condition=[>($4, 2005)]) > > LogicalTableScan(table=[[postgres, title]]) > > > > Some of the conditions are pushed down, but we still have remaining > > cartesian products and a multi-condition join. > > Looking at the input query, I would expect every Join to have a condition, > > as there are no unspecified joins, right? > > What am I missing or what can we do to deconstruct the multi-conditional > > join and avoid cartesian products? > > > > Thanks in advance for any help! > > > > Best, > > Juri > > > > >