mihaibudiu commented on code in PR #3733: URL: https://github.com/apache/calcite/pull/3733#discussion_r1575204316
########## babel/src/test/resources/sql/redshift.iq: ########## @@ -523,78 +523,78 @@ select deptno, ratio_to_report(sal) over (partition by deptno) from emp; !} # STDDEV_POP -select empno, stddev_pop(comm) over (order by empno rows unbounded preceding) from emp where deptno = 30 order by 1; +select empno, stddev_pop(CAST(comm AS DECIMAL(12, 4))) over (order by empno rows unbounded preceding) from emp where deptno = 30 order by 1; Review Comment: Initially I had filed issue https://issues.apache.org/jira/browse/CALCITE-6324 about this problem. See the comments there as well. Indeed, in Calcite, as in Oracle, the result type of an aggregate is the same as the type of the value aggregated. Moreover, the Calcite implementation of aggregates also uses *internally* the same type for all calculations. Since STDDEV needs to square values, it requires double the precision not to overflow. So without changing the type here, this program would fail because of overflows for decimal values (prior to this PR such overflows were simply ignored, which is wrong). The problem of internal overflows could be solved in two ways: (1) change the internal implementation of all aggregation functions that could overflow to use a different precision for intermediate results. (2) require the user to use a type wide enough to accommodate possible overflows Now, unfortunately the compiler cannot choose safely a type to guarantee that the results won't overflow, even for a simple aggregate such as AVG. The reason is that, although the compiler has a bound on the precision of the values aggregated, in general it has no bound on the number of rows in the table at runtime (especially for cases like incremental view maintenance, where the plans generated need to work even for future contents of the tables). So no matter what precision the compiler may choose for internal computations, this precision may be exceeded at runtime for some input data. This leaves the other alternative: the user should choose a precision high enough to carry all computations. Although the user may also not know the size of the tables, at least the user has some domain-specific knowledge. So this PR adopts the approach in (2), which is also compatible with the existing Calcite implementation of aggregates. If we want to implement (1), it will require much more work, we have to modify the implementation of all Calcite aggregates, choosing an arbitrary type for intermediate results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
