[ 
https://issues.apache.org/jira/browse/CALCITE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800251#comment-17800251
 ] 

Caican Cai edited comment on CALCITE-6166 at 12/25/23 6:48 AM:
---------------------------------------------------------------

[~julianhyde] I think this is difficult to do. I have done tests on mysql, tidb 
and other databases, and they seem to be unable to guarantee 100% conversion to 
the correct type. Here I think there is an issue that needs to be considered. 
If the double type is used, This may cause loss of accuracy when the value 
exceeds the range of double.

[The data returned by the select statement is incorrect · Issue #48860 · 
pingcap/tidb (github.com)|https://github.com/pingcap/tidb/issues/48860]


was (Author: JIRAUSER302115):
[~julianhyde] I think this is difficult to do. I have done tests on mysql, tidb 
and other databases, and they seem to be unable to guarantee 100% conversion to 
the correct type. Here I think there is an issue that needs to be considered. 
If the double type is used, This may cause loss of accuracy when the value 
exceeds the range of double.

> Optimization rule AggregateReduceFunctionsRule rewrites queries to 
> semantically non-equivalent ones
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6166
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6166
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.36.0
>            Reporter: Pinhan Zhao
>            Priority: Blocker
>
> The AggregateReduceFunctionsRule rewrites AVG to SUM divided by COUNT and 
> applies a cast.  However, the type being casted to is erroneously determined 
> - it always converts the SUM / COUNT expression to INTEGER, which is 
> inconsistent with the default return type of AVG originally.  Therefore, if 
> the average value originally evaluates to a non-integer number, the 
> expression after rewriting will lost the floating point precision as being 
> converted to an integer.
>  
> This issue is manifested in various test cases 
> ([https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java,]
>  and expected outputs 
> [https://github.com/apache/calcite/blob/08f685683e8e6e97b47189d95603ccccaf17051d/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml]).
>  
>  
> For example, the rule testReduceAverageAndSumWithNoReduceStddevAndVar 
> rewrites the original query
> {code:java}
> select name, stddev_pop(deptno), avg(deptno), var_pop(deptno) from dept group 
> by name{code}
> to
> {code:java}
> select name, stddev_pop(deptno), cast(coalesce(sum(deptno), 0) / count(*) as 
> integer), var_pop(deptno) from dept group by name{code}
>  
> However, with a counterexample
> {code:java}
> CREATE TABLE dept (
>   deptno INTEGER PRIMARY KEY,
>   name TEXT NOT NULL
> );
> INSERT INTO dept VALUES (1, 'Sales');
> INSERT INTO dept VALUES (2, 'Sales');
> {code}
>  
> The original query and the optimized query produce different outputs:
> {code:java}
>  name  |       stddev_pop       |        avg         |        var_pop         
> -------+------------------------+--------------------+------------------------
>  Sales | 0.50000000000000000000 | 1.5000000000000000 | 0.25000000000000000000
> (1 row)
>  name  |       stddev_pop       | int4 |        var_pop         
> -------+------------------------+------+------------------------
>  Sales | 0.50000000000000000000 |    1 | 0.25000000000000000000
> (1 row) {code}
> Specifically, the values of avg (1.5 vs 1) are different.
>  
> In addition to the above case, the following test cases are also affected:
>  * testReduceAverageAndVarWithNoReduceStddev
>  * testCastInAggregateReduceFunctions
>  * testReduceAverage
>  * testReduceWithNonTypePredicate
>  * testReduceAverageWithNoReduceSum
>  * testReduceAllAggregateFunctions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to