Calcite’s usual behavior is to split AVG(x) into SUM(x) / COUNT(x). (Actually something slightly more complicated, to deal with COUNT = 0.)
Normally, this happens during sql-to-rel conversion: the expandAvg method in StandardConvertletTable[1]. If you add a test for say “select avg(sal) from emp” to SqlToRelConvererTest you should see that happening. But if you are going straight to RelNode, there is also a rule you can use: AggregateReduceFunctionsRule [2]. Julian [1] https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java?line=1194 <https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java?line=1194> [2] https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java?line=82 <https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java?line=82> > On Jan 3, 2018, at 4:20 PM, Michael Alexeev <[email protected]> wrote: > > Hi All, > > Please consider a simple aggregated query > > select avg(C) from T; > > which is translated into a following relational expression > > LogicalAggregate(group=[{}], EXPR$0=[AVG($0)]) > LogicalProject(C=[$2]) > VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}], > proj#0..5=[{exprs}]) > > And this works perfectly fine if a table physically located on a single > physical machine. > > But in case of a distributed table (partitioned table where each partition > resides on a different physical node, for example) this query needs to be > run on each individual node, each node's results are sent to a dedicated > coordinator node which does the final aggregation across the nodes. To > calculate a column's AVG in this case, each node has to calculate two > aggregates SUM(T.C) and COUNT(T.C) and the coordinator node has also > compute the SUM and COUNT across the nodes and divide the final SUM by the > final COUNT. > > I wonder if there is a way to force Calcite to translate the query above > into > LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[COUNT($0)]) > LogicalProject(C=[$2]) > VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}], > proj#0..5=[{exprs}]) > > instead of a single AVG aggregate? > > Thanks, > Mike
