I actually tried to add the AggregateReduceFunctionsRule but the cumulative
cost of the SUM / COUNT expression was always higher than the cost of the
original AGR making it a winner.

Not sure how to get around it.

According to the documentation, The AvgVarianceConvertlet.expandAvg is
invoked in case of windowed aggregate functions and not the regular aggregates
like in my case.

Mike

On Wed, Jan 3, 2018 at 9:15 PM, Julian Hyde <[email protected]> wrote:

> Calcite’s usual behavior is to split AVG(x) into SUM(x) / COUNT(x).
> (Actually something slightly more complicated, to deal with COUNT = 0.)
>
> Normally, this happens during sql-to-rel conversion: the expandAvg method
> in StandardConvertletTable[1]. If you add a test for say “select avg(sal)
> from emp” to SqlToRelConvererTest you should see that happening.
>
> But if you are going straight to RelNode, there is also a rule you can
> use: AggregateReduceFunctionsRule [2].
>
> Julian
>
> [1] https://insight.io/github.com/apache/calcite/blob/master/
> core/src/main/java/org/apache/calcite/sql2rel/
> StandardConvertletTable.java?line=1194 <https://insight.io/github.
> com/apache/calcite/blob/master/core/src/main/java/org/
> apache/calcite/sql2rel/StandardConvertletTable.java?line=1194>
>
> [2] https://insight.io/github.com/apache/calcite/blob/master/
> core/src/main/java/org/apache/calcite/rel/rules/
> AggregateReduceFunctionsRule.java?line=82 <https://insight.io/github.
> com/apache/calcite/blob/master/core/src/main/java/org/
> apache/calcite/rel/rules/AggregateReduceFunctionsRule.java?line=82>
>
>
> > On Jan 3, 2018, at 4:20 PM, Michael Alexeev <[email protected]>
> wrote:
> >
> > Hi All,
> >
> > Please consider a simple aggregated query
> >
> > select avg(C) from T;
> >
> > which is translated into a following relational expression
> >
> > LogicalAggregate(group=[{}], EXPR$0=[AVG($0)])
> >  LogicalProject(C=[$2])
> >    VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}],
> > proj#0..5=[{exprs}])
> >
> > And this works perfectly fine if a table physically located on a single
> > physical machine.
> >
> > But in case of a distributed table (partitioned table where each
> partition
> > resides on a different physical node, for example) this query needs to be
> > run on each individual node, each node's results are sent to a dedicated
> > coordinator node which does the final aggregation across the nodes. To
> > calculate a column's AVG  in this case, each node has to calculate two
> > aggregates SUM(T.C) and COUNT(T.C) and the coordinator node has also
> > compute the SUM and COUNT across the nodes and divide the final SUM by
> the
> > final COUNT.
> >
> > I wonder if there is a way to force Calcite to translate the query above
> > into
> > LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[COUNT($0)])
> >  LogicalProject(C=[$2])
> >    VoltDBTableSeqScan(table=[[T]], expr#0..5=[{inputs}],
> > proj#0..5=[{exprs}])
> >
> > instead of a single AVG aggregate?
> >
> > Thanks,
> > Mike
>
>

Reply via email to