[ https://issues.apache.org/jira/browse/CALCITE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797308#comment-17797308 ]
Julian Hyde commented on CALCITE-6166: -------------------------------------- I agree that the rule should preserve the type. Depending on the {{RelDataTypeSystem}}, the return type of an {{AVG}} call may or may not be the default type. Ideally the rule would have the correct behavior regardless of {{RelDataTypeSystem}}. Is that possible? > Optimization rule AggregateReduceFunctionsRule rewrites queries to > semantically non-equivalent ones > --------------------------------------------------------------------------------------------------- > > Key: CALCITE-6166 > URL: https://issues.apache.org/jira/browse/CALCITE-6166 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.36.0 > Reporter: Pinhan Zhao > Priority: Blocker > > The AggregateReduceFunctionsRule rewrites AVG to SUM divided by COUNT and > applies a cast. However, the type being casted to is erroneously determined > - it always converts the SUM / COUNT expression to INTEGER, which is > inconsistent with the default return type of AVG originally. Therefore, if > the average value originally evaluates to a non-integer number, the > expression after rewriting will lost the floating point precision as being > converted to an integer. > > This issue is manifested in various test cases > ([https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java,] > and expected outputs > [https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml]). > > > For example, the rule testReduceAverageAndSumWithNoReduceStddevAndVar > rewrites the original query > {code:java} > select name, stddev_pop(deptno), avg(deptno), var_pop(deptno) from dept group > by name{code} > to > {code:java} > select name, stddev_pop(deptno), cast(coalesce(sum(deptno), 0) / count(*) as > integer), var_pop(deptno) from dept group by name{code} > > However, with a counterexample > {code:java} > CREATE TABLE dept ( > deptno INTEGER PRIMARY KEY, > name TEXT NOT NULL > ); > INSERT INTO dept VALUES (1, 'Sales'); > INSERT INTO dept VALUES (2, 'Sales'); > {code} > > The original query and the optimized query produce different outputs: > {code:java} > name | stddev_pop | avg | var_pop > -------+------------------------+--------------------+------------------------ > Sales | 0.50000000000000000000 | 1.5000000000000000 | 0.25000000000000000000 > (1 row) > name | stddev_pop | int4 | var_pop > -------+------------------------+------+------------------------ > Sales | 0.50000000000000000000 | 1 | 0.25000000000000000000 > (1 row) {code} > Specifically, the values of avg (1.5 vs 1) are different. > > In addition to the above case, the following test cases are also affected: > * testReduceAverageAndVarWithNoReduceStddev > * testCastInAggregateReduceFunctions > * testReduceAverage > * testReduceWithNonTypePredicate > * testReduceAverageWithNoReduceSum > * testReduceAllAggregateFunctions -- This message was sent by Atlassian Jira (v8.20.10#820010)