Pinhan Zhao created CALCITE-6166:
------------------------------------
Summary: Optimization rule AggregateReduceFunctionsRule rewrites
queries to semantically non-equivalent ones
Key: CALCITE-6166
URL: https://issues.apache.org/jira/browse/CALCITE-6166
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.36.0
Reporter: Pinhan Zhao
The AggregateReduceFunctionsRule rewrites AVG to SUM divided by COUNT and
applies a cast. However, the type being casted to is erroneously determined -
it always converts the SUM / COUNT expression to INTEGER, which is inconsistent
with the default return type of AVG originally. Therefore, if the average
value originally evaluates to a non-integer number, the expression after
rewriting will lost the floating point precision as being converted to an
integer.
This issue is manifested in various test cases
([https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java,]
and expected outputs
https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml).
For example, the rule testReduceAverageAndSumWithNoReduceStddevAndVar rewrites
the original query
{code:java}
select name, stddev_pop(deptno), avg(deptno), var_pop(deptno) from dept group
by name{code}
to
{code:java}
select name, stddev_pop(deptno), cast(coalesce(sum(deptno), 0) / count(*) as
integer), var_pop(deptno) from dept group by name{code}
However, with a counterexample
{code:java}
CREATE TABLE dept (
deptno INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
INSERT INTO dept VALUES (1, 'Sales');
INSERT INTO dept VALUES (2, 'Sales');
{code}
The original query and the optimized query produce different outputs:
{code:java}
name | stddev_pop | avg | var_pop
-------+------------------------+--------------------+------------------------
Sales | 0.50000000000000000000 | 1.5000000000000000 | 0.25000000000000000000
(1 row) name | stddev_pop | int4 | var_pop
-------+------------------------+------+------------------------
Sales | 0.50000000000000000000 | 1 | 0.25000000000000000000
(1 row) {code}
Specifically, the values of avg (1.5 vs 1) are different.
In addition to the above case, the following test cases are also affected:
* testReduceAverageAndVarWithNoReduceStddev
* testCastInAggregateReduceFunctions
* testReduceAverage
* testReduceWithNonTypePredicate
* testReduceAverageWithNoReduceSum
* testReduceAllAggregateFunctions
--
This message was sent by Atlassian Jira
(v8.20.10#820010)