Hi,
I am trying to come up with an optimized relational expression which does
predicate push downs
& partial aggregates in preparation for a distributed execution of the
query.
SQL of interest:
SELECT col1, SUM(col2) FROM ( SELECT col2, col2 FROM Orders1
UNION ALL
SELECT col1, col2
FROM Orders2
) WHERE col1=1 GROUP
BY col1;
Relational Expression - Initial:
LogicalAggregate(group=[{0}], EXPR$1=[AVG($1)])
LogicalFilter(condition=[=($0, 1)])
LogicalUnion(all=[true])
LogicalProject(id=[$0], units=[$2])
LogicalTableScan(table=[[SALES, Orders1]])
LogicalProject(id=[$0], units=[$2])
LogicalTableScan(table=[[SALES, Orders2]])
Query Optimization rules applied:
HepProgram program = new HepProgramBuilder()
.addRuleInstance(AggregateUnionAggregateRule.INSTANCE)
.addRuleInstance(AggregateUnionTransposeRule.INSTANCE)
.addRuleInstance(AggregateReduceFunctionsRule.INSTANCE)
.build();
HepPlanner planner = new HepPlanner(program);
planner.setRoot(oldLogicalPlan);
Relational Expression after Query Optimization: RelNode newLogicalPlan =
planner.findBestExp();
LogicalProject(id=[$0], EXPR$1=[CAST(/($1, $2)):INTEGER NOT NULL])
LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT()])
LogicalFilter(condition=[=($0, 1)])
LogicalUnion(all=[true])
LogicalProject(id=[$0], units=[$2])
LogicalTableScan(table=[[SALES, Orders1]])
LogicalProject(id=[$0], units=[$2])
LogicalTableScan(table=[[SALES, Orders2]])
Questions:
1. Are there any rules to push the predicates(col1=1) down to the sub
queries ?
2. How can I rewrite the query such that the partial aggregates are
computed within each union(as below)?
Tried AggregateUnionTransposeRule and AggregateJoinTransposeRule. May be
I am missed something.
SELECT col1, SUM(partialCol2) AS c
FROM ( SELECT col1, SUM(col2) AS partialCol2 FROM Orders1 where
col1=1 GROUP BY deptno
UNION ALL
SELECT col1, SUM(col2) AS partialCol2 FROM Orders2
where col1=1 GROUP BY deptno
) GROUP BY col1;