Hello,
I am trying to improve my query planner based on hive's implementation of
Calcite Planner (
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java).
I have split my optimizing procedure in a similar way like Hive's planner.
At first, I use some pre-join order optimizations. Then I am
using LoptOptimizeJoinRule.INSTANCE for join order and finally I apply some
rules that don't need statistics to get my final plan. I face two problems :
1) When I have a query like this :
"select * "
+ "from s.products join s.orders "
+ "on s.orders.productid = s.products.productid "
+ " where units>10 and description < 20 "
);
I get this plan, after using the LoptOptimizeJoinRule :
LogicalProject(rowtime=[$5], productid=[$6], description=[$7],
rowtime0=[$0], orderid=[$1], productid0=[$2], units=[$3], customerid=[$4])
LogicalJoin(condition=[=($6, $2)], joinType=[inner])
LogicalFilter(condition=[>($3, 10)])
LogicalTableScan(table=[[s, orders]])
LogicalFilter(condition=[<($2, 20)])
LogicalTableScan(table=[[s, products]])
The final plan has an extra Projection over the Join. This projection has
no use and I want to get rid of it.
I tried to create a rule that transforms a project(join) -> join ,when they
have the same output schema, but I couldn't find the output schema of the
join operator. Am I doing something wrong with the order or the way I
enforce the rules? Is there an easy way to get rid of this topProject?
2)After I have used the LoptOptimizeJoinRule and get my optimized order, I
can't use JoinCommuteRule, as the hepPlanner runs forever.
Thank you in advance,
George