Hello,

I am trying to improve my query planner based on hive's implementation of
Calcite Planner (
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java).
I have split my optimizing procedure in a similar way like Hive's planner.
At first, I use some pre-join order optimizations. Then I am
using LoptOptimizeJoinRule.INSTANCE for join order and finally I apply some
rules that don't need statistics to get my final plan. I face two problems :

1) When I have a query like this :
    "select *  "
         + "from  s.products join s.orders  "
         + "on s.orders.productid = s.products.productid  "
        + " where units>10 and description < 20 "
       );

I get this plan, after using the LoptOptimizeJoinRule :
LogicalProject(rowtime=[$5], productid=[$6], description=[$7],
rowtime0=[$0], orderid=[$1], productid0=[$2], units=[$3], customerid=[$4])
  LogicalJoin(condition=[=($6, $2)], joinType=[inner])
    LogicalFilter(condition=[>($3, 10)])
      LogicalTableScan(table=[[s, orders]])
    LogicalFilter(condition=[<($2, 20)])
      LogicalTableScan(table=[[s, products]])

The final plan has an extra Projection over the Join. This projection has
no use and I want to get rid of it.
I tried to create a rule that transforms a project(join) -> join ,when they
have the same output schema, but I couldn't find the output schema of the
join operator. Am I doing something wrong with the order or the way I
enforce the rules? Is there an easy way to get rid of this topProject?

2)After I have used the LoptOptimizeJoinRule and get my optimized order, I
can't use JoinCommuteRule, as the hepPlanner runs forever.

Thank you in advance,
George

Reply via email to