Re: Computing Partial Aggregates for UNION-ALL

[email protected] Wed, 29 Jun 2016 10:08:24 -0700

I'm no Calcite expert (yet) but I have a few suggestions based on my own 
experience with using Planner and digging through its code. Keep in mind that 
there are surely better people to explain this around here, but I'll do my best 
based on what I've learned...


When using Planner, you shouldn't really need to access the underlying 
RelOptPlanner (or VolcanoPlanner) directly. How it should be used is by adding 
new Programs that use specific planners, e.g. Programs.ofRules(rules) for 
Volcano or Programs.hep for heuristic rules. Check out the Programs class for 
all the options. Use the FrameworkConfig builder to add programs() and 
traitDefs(). Then use Planner.transform(int, RelTraitSet) to apply rules in 
multiple phases. If you add two Programs to the Planner, the first will be run 
in transform(0, traits) and the second will be run in transform(1, traits). The 
Planner will internally handle calling changeTraits and findBestExp. You can 
find examples of this usage of Planner in tests.

I haven't seen great documentation on conventions, but what I understand of 
conventions is that each RelNode has a set of input and output traits, one of 
which can be a calling Convention. In order for one node to be the input of 
another, that node's output traits must match the input traits of the other, 
and the goal of the planner is to make the root node's out trait set match the 
desired trait set by applying rules to the tree. The reason 
EnumerableConvention works is because there are Enumberable converter rules 
associated with that convention that are handling the conversion of RelNode to 
that convention such that all in/out conventions are compatible with one 
another and the root node has the Enumerable convention. Some rules may operate 
on Logical nodes, e.g. filter push down, join reordering, etc, and others may 
convert to or between calling conventions. Were you to set a different custom 
calling convention, that convention would need to have its own set of converter 
rules capable of converting each node in the tree for planning to be successful 
otherwise you could not convert from Convention.NONE to your custom convention. 
Some rules can also convert between two calling conventions, e.g. the JdbcRules 
have a JdbcToEnumerableConverterRule that converts the JDBC convention to 
Enumerable. In practice, this represents the bridge between a JDBC query and 
Calcite's enumerables. That is, the JdbcToEnumerableConverter is the point 
where a JDBC query is compiled and run and the results are passed to the 
EnumerableRels for completion of additional query logic that could not be 
pushed down. Even if an entire query can be pushed down, the root will still 
always have a JdbcToEnumerableConverter since the planner is expecting the 
Enumerable convention trait at the root of the tree.

Hope my limited knowledge helps :-)

> On Jun 27, 2016, at 10:10 PM, Ravikumar CS <[email protected]> wrote:
> 
> Hi,
> 
>   I am trying to get the VolcanoPlanner working. I took the simple planner
> from Milinda[1] and built a modified planner[2] which uses Volcano planner
> for optimization.
> 
> The table that I am using is CSVFilterableTable[3]. However, the volcano
> planner fails to optimize with the following error[4]
> 
> Questions:
> 1. It works when I explicitly set the EnumerableConvention(Line 95-98). In
> that case the rules seem to fire. I get back a plan in enumerable
> convention. Is that expected ?
> 
> 2. If I want to take the initial LogicalPlan & generate the optimized
> logical plan. How can I achieve this using VolcanoPlanner? ( Just the way
> it worked using HepPlanner)
> 
> 3. Am I missing any crucial planner rules ?
> 
> 4. I want to understand more about the Convention concept and how it
> relates to Planner. Is there a documentation that I can go through?
> 
> ~Ravi
> 
> 
> [1]
> https://github.com/milinda/calcite-tutorial/blob/master/src/main/java/org/pathirage/calcite/tutorial/planner/SimpleQueryPlanner.java
> 
> [2] BasicQueryPlanner with Volcano:
> Script: https://gist.github.com/ravikumarcs/724b7cbb1053a1650664aabc6eeb7271
> 
> Output: https://gist.github.com/ravikumarcs/d0d50c414cae47be18f45e57a58749dd
> 
> [3] Model:
> https://github.com/apache/calcite/blob/master/example/csv/src/test/resources/filterable-model.json
> 
> [4] VolcanoPlanner failure:
> https://gist.github.com/ravikumarcs/10b53d47ad0bd1037436eab7c342c048
> 
> 
> 
>> On Thu, Jun 2, 2016 at 2:33 PM, Ravikumar CS <[email protected]> wrote:
>> 
>> You are right. I changed the order of the rules & it worked. Thanks Julian.
>> 
>> Rule Order: FilterSetOpTransposeRule -> AggregateReduceFunctionsRule ->
>> AggregateUnionTransposeRule
>> 
>> New Plan:
>> 
>> LogicalProject(id=[$0], EXPR$1=[CAST(/($1, $2)):INTEGER NOT NULL])
>> 
>>  LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)])
>> 
>>    LogicalUnion(all=[true])
>> 
>>      LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT()])
>> 
>>        LogicalFilter(condition=[=($0, 1)])
>> 
>>          LogicalProject(id=[$0], units=[$2])
>> 
>>            LogicalTableScan(table=[[SALES, Orders1]])
>> 
>>      LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT()])
>> 
>>        LogicalFilter(condition=[=($0, 1)])
>> 
>>          LogicalProject(id=[$0], units=[$2])
>> 
>>            LogicalTableScan(table=[[SALES, Orders2]])
>> 
>>> On Thu, Jun 2, 2016 at 2:07 PM, Julian Hyde <[email protected]> wrote:
>>> 
>>> Why do you want to enable AggregateUnionAggregateRule[1]? It is doing
>>> the opposite of AggregateUnionTransposeRule.
>>> 
>>> Julian
>>> 
>>> [1]
>>> https://calcite.apache.org/apidocs/org/apache/calcite/rel/rules/AggregateUnionAggregateRule.html
>>> 
>>> 
>>> On Thu, Jun 2, 2016 at 2:03 PM, Ravikumar CS <[email protected]>
>>> wrote:
>>>> Thanks Julian. FilterSetOpTransposeRule worked for pushing the filter
>>> into
>>>> the union.
>>>> 
>>>> However the partial aggregate logic doesn't seem to work even after
>>> adding
>>>> the rules AggregateUnionTransposeRule & AggregateUnionAggregateRule.
>>>> 
>>>> *Query:* SELECT id, SUM(units) FROM (SELECT id, units FROM Orders1 UNION
>>>> ALL SELECT id, units FROM Orders2) where id=1 group by id
>>>> 
>>>> *Plan:*
>>>> 
>>>> LogicalAggregate(group=[{0}], EXPR$1=[$SUM0($1)])
>>>> 
>>>>  LogicalUnion(all=[true])
>>>> 
>>>>    LogicalFilter(condition=[=($0, 1)])
>>>> 
>>>>      LogicalProject(id=[$0], units=[$2])
>>>> 
>>>>        LogicalTableScan(table=[[SALES, Orders1]])
>>>> 
>>>>    LogicalFilter(condition=[=($0, 1)])
>>>> 
>>>>      LogicalProject(id=[$0], units=[$2])
>>>> 
>>>>        LogicalTableScan(table=[[SALES, Orders2]])
>>>> 
>>>> ~Ravi
>>>> 
>>>>> On Thu, Jun 2, 2016 at 12:11 PM, Julian Hyde <[email protected]> wrote:
>>>>> 
>>>>> I logged https://issues.apache.org/jira/browse/CALCITE-1271.
>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 12:11 PM, Julian Hyde <[email protected]> wrote:
>>>>>> By the way, I noticed that FilterSetOpTransposeRule and
>>>>>> AggregateUnionTransposeRule are not part of the default rule set. We
>>>>>> should fix that.
>>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 10:20 AM, Julian Hyde <[email protected]>
>>> wrote:
>>>>>>> You need to push the Filter into the Union. Otherwise the Aggregate
>>> is
>>>>> on top of a Filter, not a Union. Use FilterSetOpTransposeRule.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>>> On Jun 2, 2016, at 9:54 AM, Ravikumar CS <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I am trying to come up with an optimized relational expression
>>> which
>>>>> does
>>>>>>>> predicate push downs
>>>>>>>> 
>>>>>>>> & partial aggregates in preparation for a distributed execution of
>>> the
>>>>>>>> query.
>>>>>>>> 
>>>>>>>> SQL of interest:
>>>>>>>> 
>>>>>>>> SELECT col1, SUM(col2) FROM ( SELECT col2, col2 FROM Orders1
>>>>>>>> 
>>>>>>>>                                                            UNION
>>> ALL
>>>>>>>> 
>>>>>>>>                                                       SELECT col1,
>>>>> col2
>>>>>>>> FROM Orders2
>>>>>>>> 
>>>>>>>>                                                    )  WHERE col1=1
>>>>> GROUP
>>>>>>>> BY col1;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Relational Expression - Initial:
>>>>>>>> 
>>>>>>>> LogicalAggregate(group=[{0}], EXPR$1=[AVG($1)])
>>>>>>>> 
>>>>>>>> LogicalFilter(condition=[=($0, 1)])
>>>>>>>> 
>>>>>>>>   LogicalUnion(all=[true])
>>>>>>>> 
>>>>>>>>     LogicalProject(id=[$0], units=[$2])
>>>>>>>> 
>>>>>>>>       LogicalTableScan(table=[[SALES, Orders1]])
>>>>>>>> 
>>>>>>>>     LogicalProject(id=[$0], units=[$2])
>>>>>>>> 
>>>>>>>>       LogicalTableScan(table=[[SALES, Orders2]])
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Query Optimization rules applied:
>>>>>>>> 
>>>>>>>> HepProgram program = new HepProgramBuilder()
>>>>>>>> 
>>>>>>>>       .addRuleInstance(AggregateUnionAggregateRule.INSTANCE)
>>>>>>>> 
>>>>>>>>       .addRuleInstance(AggregateUnionTransposeRule.INSTANCE)
>>>>>>>> 
>>>>>>>>       .addRuleInstance(AggregateReduceFunctionsRule.INSTANCE)
>>>>>>>> 
>>>>>>>>       .build();
>>>>>>>> 
>>>>>>>> HepPlanner planner = new HepPlanner(program);
>>>>>>>> 
>>>>>>>> planner.setRoot(oldLogicalPlan);
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Relational Expression after Query Optimization: RelNode
>>> newLogicalPlan
>>>>> =
>>>>>>>> planner.findBestExp();
>>>>>>>> 
>>>>>>>> LogicalProject(id=[$0], EXPR$1=[CAST(/($1, $2)):INTEGER NOT NULL])
>>>>>>>> 
>>>>>>>> LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT()])
>>>>>>>> 
>>>>>>>>   LogicalFilter(condition=[=($0, 1)])
>>>>>>>> 
>>>>>>>>     LogicalUnion(all=[true])
>>>>>>>> 
>>>>>>>>       LogicalProject(id=[$0], units=[$2])
>>>>>>>> 
>>>>>>>>         LogicalTableScan(table=[[SALES, Orders1]])
>>>>>>>> 
>>>>>>>>       LogicalProject(id=[$0], units=[$2])
>>>>>>>> 
>>>>>>>>         LogicalTableScan(table=[[SALES, Orders2]])
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Questions:
>>>>>>>> 
>>>>>>>> 1. Are there any rules to push the predicates(col1=1) down to the
>>> sub
>>>>>>>> queries ?
>>>>>>>> 
>>>>>>>> 2. How can I rewrite the query such that the partial aggregates are
>>>>>>>> computed within each union(as below)?
>>>>>>>> 
>>>>>>>>  Tried AggregateUnionTransposeRule and AggregateJoinTransposeRule.
>>>>> May be
>>>>>>>> I am missed something.
>>>>>>>> 
>>>>>>>>  SELECT col1, SUM(partialCol2) AS c
>>>>>>>> 
>>>>>>>>     FROM ( SELECT col1, SUM(col2) AS partialCol2 FROM Orders1
>>> where
>>>>>>>> col1=1 GROUP BY deptno
>>>>>>>> 
>>>>>>>>                      UNION ALL
>>>>>>>> 
>>>>>>>>                   SELECT col1, SUM(col2) AS partialCol2 FROM
>>> Orders2
>>>>>>>> where col1=1 GROUP BY deptno
>>>>>>>> 
>>>>>>>>                 )   GROUP BY col1;
>> 
>>

Re: Computing Partial Aggregates for UNION-ALL

Reply via email to