Pinhan Zhao created CALCITE-6166:
------------------------------------

             Summary: Optimization rule AggregateReduceFunctionsRule rewrites 
queries to semantically non-equivalent ones
                 Key: CALCITE-6166
                 URL: https://issues.apache.org/jira/browse/CALCITE-6166
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.36.0
            Reporter: Pinhan Zhao


The AggregateReduceFunctionsRule rewrites AVG to SUM divided by COUNT and 
applies a cast.  However, the type being casted to is erroneously determined - 
it always converts the SUM / COUNT expression to INTEGER, which is inconsistent 
with the default return type of AVG originally.  Therefore, if the average 
value originally evaluates to a non-integer number, the expression after 
rewriting will lost the floating point precision as being converted to an 
integer.

 

This issue is manifested in various test cases 
([https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java,]
 and expected outputs 
https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml).
 

 

For example, the rule testReduceAverageAndSumWithNoReduceStddevAndVar rewrites 
the original query
{code:java}
select name, stddev_pop(deptno), avg(deptno), var_pop(deptno) from dept group 
by name{code}
to
{code:java}
select name, stddev_pop(deptno), cast(coalesce(sum(deptno), 0) / count(*) as 
integer), var_pop(deptno) from dept group by name{code}
 

However, with a counterexample
{code:java}
CREATE TABLE dept (
  deptno INTEGER PRIMARY KEY,
  name TEXT NOT NULL
);
INSERT INTO dept VALUES (1, 'Sales');
INSERT INTO dept VALUES (2, 'Sales');
{code}
 

The original query and the optimized query produce different outputs:
{code:java}
 name  |       stddev_pop       |        avg         |        var_pop         
-------+------------------------+--------------------+------------------------
 Sales | 0.50000000000000000000 | 1.5000000000000000 | 0.25000000000000000000

(1 row) name  |       stddev_pop       | int4 |        var_pop         
-------+------------------------+------+------------------------
 Sales | 0.50000000000000000000 |    1 | 0.25000000000000000000
(1 row) {code}
Specifically, the values of avg (1.5 vs 1) are different.

 

In addition to the above case, the following test cases are also affected:
 * testReduceAverageAndVarWithNoReduceStddev
 * testCastInAggregateReduceFunctions
 * testReduceAverage
 * testReduceWithNonTypePredicate
 * testReduceAverageWithNoReduceSum
 * testReduceAllAggregateFunctions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to