Hi All,

Please consider a simple aggregated query

select avg(C) from T;

which is translated into a following relational expression

LogicalAggregate(group=[{}], EXPR$0=[AVG($0)])
  LogicalProject(C=[$2])
    VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}],
proj#0..5=[{exprs}])

And this works perfectly fine if a table physically located on a single
physical machine.

But in case of a distributed table (partitioned table where each partition
resides on a different physical node, for example) this query needs to be
run on each individual node, each node's results are sent to a dedicated
coordinator node which does the final aggregation across the nodes. To
calculate a column's AVG  in this case, each node has to calculate two
aggregates SUM(T.C) and COUNT(T.C) and the coordinator node has also
compute the SUM and COUNT across the nodes and divide the final SUM by the
final COUNT.

I wonder if there is a way to force Calcite to translate the query above
into
LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[COUNT($0)])
  LogicalProject(C=[$2])
    VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}],
proj#0..5=[{exprs}])

instead of a single AVG aggregate?

Thanks,
Mike

Reply via email to