Calcite’s usual behavior is to split AVG(x) into SUM(x) / COUNT(x). (Actually 
something slightly more complicated, to deal with COUNT = 0.)

Normally, this happens during sql-to-rel conversion: the expandAvg method in 
StandardConvertletTable[1]. If you add a test for say “select avg(sal) from 
emp” to SqlToRelConvererTest you should see that happening.

But if you are going straight to RelNode, there is also a rule you can use: 
AggregateReduceFunctionsRule [2].

Julian

[1] 
https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java?line=1194
 
<https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java?line=1194>
 

[2] 
https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java?line=82
 
<https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java?line=82>


> On Jan 3, 2018, at 4:20 PM, Michael Alexeev <[email protected]> wrote:
> 
> Hi All,
> 
> Please consider a simple aggregated query
> 
> select avg(C) from T;
> 
> which is translated into a following relational expression
> 
> LogicalAggregate(group=[{}], EXPR$0=[AVG($0)])
>  LogicalProject(C=[$2])
>    VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}],
> proj#0..5=[{exprs}])
> 
> And this works perfectly fine if a table physically located on a single
> physical machine.
> 
> But in case of a distributed table (partitioned table where each partition
> resides on a different physical node, for example) this query needs to be
> run on each individual node, each node's results are sent to a dedicated
> coordinator node which does the final aggregation across the nodes. To
> calculate a column's AVG  in this case, each node has to calculate two
> aggregates SUM(T.C) and COUNT(T.C) and the coordinator node has also
> compute the SUM and COUNT across the nodes and divide the final SUM by the
> final COUNT.
> 
> I wonder if there is a way to force Calcite to translate the query above
> into
> LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[COUNT($0)])
>  LogicalProject(C=[$2])
>    VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}],
> proj#0..5=[{exprs}])
> 
> instead of a single AVG aggregate?
> 
> Thanks,
> Mike

Reply via email to