Hello, I am in need of some help or guidance. We are currently trying to parse Join-Order-Benchmark queries into plans with Apache Calcite. The following query is our input:
SELECT MIN(t.title) AS movie_title FROM postgres.company_name AS cn, postgres.keyword AS k, postgres.movie_companies AS mc, postgres.movie_keyword AS mk, postgres.title AS t WHERE cn.country_code ='[de]' AND k.keyword ='character-name-in-title' AND cn.id = mc.company_id AND mc.movie_id = t.id AND t.id = mk.movie_id AND mk.keyword_id = k.id AND mc.movie_id = mk.movie_id; This is being parsed into the following RelNode: LogicalAggregate(group=[{}], movie_title=[MIN($0)]) LogicalProject(title=[$19]) LogicalFilter(condition=[AND(=($2, '[de]'), =($8, 'character-name-in-title'), =($0, $12), =($11, $18), =($18, $16), =($17, $7), =($11, $16))]) LogicalJoin(condition=[true], joinType=[inner]) LogicalJoin(condition=[true], joinType=[inner]) LogicalJoin(condition=[true], joinType=[inner]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[postgres, company_name]]) LogicalTableScan(table=[[postgres, keyword]]) LogicalTableScan(table=[[postgres, movie_companies]]) LogicalTableScan(table=[[postgres, movie_keyword]]) LogicalTableScan(table=[[postgres, title]]) I can observe that there are only 4 joins at this point in time. After this, we optimize the plan with the following RuleSet: ( CoreRules.JOIN_PUSH_TRANSITIVE_PREDICATES, CoreRules.JOIN_CONDITION_PUSH, CoreRules.FILTER_INTO_JOIN, CoreRules.JOIN_ASSOCIATE ) However, the resuliting plan looks like this: WayangAggregate(group=[{}], movie_title=[MIN($0)]) WayangProject(title=[$19]) WayangJoin(condition=[=($11, $18)], joinType=[inner]) WayangJoin(condition=[=($0, $12)], joinType=[inner]) WayangFilter(condition=[=($2, '[de]')]) WayangTableScan(table=[[postgres, company_name]]) WayangJoin(condition=[=($10, $0)], joinType=[inner]) WayangFilter(condition=[=($1, 'character-name-in-title')]) WayangTableScan(table=[[postgres, keyword]]) WayangJoin(condition=[=($1, $6)], joinType=[inner]) WayangTableScan(table=[[postgres, movie_companies]]) WayangTableScan(table=[[postgres, movie_keyword]]) WayangTableScan(table=[[postgres, title]]) My problem with this is that the original Filter has more than 4 join conditions, but only 4 joins are being created and some of the Join conditions are never pushed down or applied. I am either just misunderstanding why this happens or am missing a rule to enforce the pushdown/usage of all conditions or instantiation of all joins. Can anybody give me some pointers? Thanks in advance for your help! Best, Juri