Hello,
I am in need of some help or guidance.
We are currently trying to parse Join-Order-Benchmark queries into plans with 
Apache Calcite.
The following query is our input:

SELECT MIN(t.title) AS movie_title
FROM postgres.company_name AS cn,
     postgres.keyword AS k,
     postgres.movie_companies AS mc,
     postgres.movie_keyword AS mk,
     postgres.title AS t
WHERE cn.country_code ='[de]'
  AND k.keyword ='character-name-in-title'
  AND cn.id = mc.company_id
  AND mc.movie_id = t.id
  AND t.id = mk.movie_id
  AND mk.keyword_id = k.id
  AND mc.movie_id = mk.movie_id;

This is being parsed into the following RelNode:

LogicalAggregate(group=[{}], movie_title=[MIN($0)])
  LogicalProject(title=[$19])
    LogicalFilter(condition=[AND(=($2, '[de]'), =($8, 
'character-name-in-title'), =($0, $12), =($11, $18), =($18, $16), =($17, $7), 
=($11, $16))])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner])
            LogicalJoin(condition=[true], joinType=[inner])
              LogicalTableScan(table=[[postgres, company_name]])
              LogicalTableScan(table=[[postgres, keyword]])
            LogicalTableScan(table=[[postgres, movie_companies]])
          LogicalTableScan(table=[[postgres, movie_keyword]])
        LogicalTableScan(table=[[postgres, title]])

I can observe that there are only 4 joins at this point in time. After this, we 
optimize the plan with the following RuleSet:

(
            CoreRules.JOIN_PUSH_TRANSITIVE_PREDICATES,
            CoreRules.JOIN_CONDITION_PUSH,
            CoreRules.FILTER_INTO_JOIN,
            CoreRules.JOIN_ASSOCIATE
)

However, the resuliting plan looks like this:

WayangAggregate(group=[{}], movie_title=[MIN($0)])
  WayangProject(title=[$19])
    WayangJoin(condition=[=($11, $18)], joinType=[inner])
      WayangJoin(condition=[=($0, $12)], joinType=[inner])
        WayangFilter(condition=[=($2, '[de]')])
          WayangTableScan(table=[[postgres, company_name]])
        WayangJoin(condition=[=($10, $0)], joinType=[inner])
          WayangFilter(condition=[=($1, 'character-name-in-title')])
            WayangTableScan(table=[[postgres, keyword]])
          WayangJoin(condition=[=($1, $6)], joinType=[inner])
            WayangTableScan(table=[[postgres, movie_companies]])
            WayangTableScan(table=[[postgres, movie_keyword]])
      WayangTableScan(table=[[postgres, title]])

My problem with this is that the original Filter has more than 4 join 
conditions, but only 4 joins are being created and some of the Join conditions 
are never pushed down or applied. 
I am either just misunderstanding why this happens or am missing a rule to 
enforce the pushdown/usage of all conditions or instantiation of all joins.
Can anybody give me some pointers?

Thanks in advance for your help!

Best,
Juri

Reply via email to