[ 
https://issues.apache.org/jira/browse/CALCITE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797308#comment-17797308
 ] 

Julian Hyde commented on CALCITE-6166:
--------------------------------------

I agree that the rule should preserve the type. Depending on the 
{{RelDataTypeSystem}}, the return type of an {{AVG}} call may or may not be the 
default type. Ideally the rule would have the correct behavior regardless of 
{{RelDataTypeSystem}}. Is that possible?

> Optimization rule AggregateReduceFunctionsRule rewrites queries to 
> semantically non-equivalent ones
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6166
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6166
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.36.0
>            Reporter: Pinhan Zhao
>            Priority: Blocker
>
> The AggregateReduceFunctionsRule rewrites AVG to SUM divided by COUNT and 
> applies a cast.  However, the type being casted to is erroneously determined 
> - it always converts the SUM / COUNT expression to INTEGER, which is 
> inconsistent with the default return type of AVG originally.  Therefore, if 
> the average value originally evaluates to a non-integer number, the 
> expression after rewriting will lost the floating point precision as being 
> converted to an integer.
>  
> This issue is manifested in various test cases 
> ([https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java,]
>  and expected outputs 
> [https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml]).
>  
>  
> For example, the rule testReduceAverageAndSumWithNoReduceStddevAndVar 
> rewrites the original query
> {code:java}
> select name, stddev_pop(deptno), avg(deptno), var_pop(deptno) from dept group 
> by name{code}
> to
> {code:java}
> select name, stddev_pop(deptno), cast(coalesce(sum(deptno), 0) / count(*) as 
> integer), var_pop(deptno) from dept group by name{code}
>  
> However, with a counterexample
> {code:java}
> CREATE TABLE dept (
>   deptno INTEGER PRIMARY KEY,
>   name TEXT NOT NULL
> );
> INSERT INTO dept VALUES (1, 'Sales');
> INSERT INTO dept VALUES (2, 'Sales');
> {code}
>  
> The original query and the optimized query produce different outputs:
> {code:java}
>  name  |       stddev_pop       |        avg         |        var_pop         
> -------+------------------------+--------------------+------------------------
>  Sales | 0.50000000000000000000 | 1.5000000000000000 | 0.25000000000000000000
> (1 row)
>  name  |       stddev_pop       | int4 |        var_pop         
> -------+------------------------+------+------------------------
>  Sales | 0.50000000000000000000 |    1 | 0.25000000000000000000
> (1 row) {code}
> Specifically, the values of avg (1.5 vs 1) are different.
>  
> In addition to the above case, the following test cases are also affected:
>  * testReduceAverageAndVarWithNoReduceStddev
>  * testCastInAggregateReduceFunctions
>  * testReduceAverage
>  * testReduceWithNonTypePredicate
>  * testReduceAverageWithNoReduceSum
>  * testReduceAllAggregateFunctions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to