mihaibudiu commented on code in PR #3733:
URL: https://github.com/apache/calcite/pull/3733#discussion_r1575204316


##########
babel/src/test/resources/sql/redshift.iq:
##########
@@ -523,78 +523,78 @@ select deptno, ratio_to_report(sal) over (partition by 
deptno) from emp;
 !}
 
 # STDDEV_POP
-select empno, stddev_pop(comm) over (order by empno rows unbounded preceding) 
from emp where deptno = 30 order by 1;
+select empno, stddev_pop(CAST(comm AS DECIMAL(12, 4))) over (order by empno 
rows unbounded preceding) from emp where deptno = 30 order by 1;

Review Comment:
   Initially I had filed issue 
https://issues.apache.org/jira/browse/CALCITE-6324 about this problem. See the 
comments there as well. Indeed, in Calcite, as in Oracle, the result type of an 
aggregate is the same as the type of the value aggregated. 
   
   Moreover, the Calcite implementation of aggregates also uses *internally* 
the same type for all calculations. Since STDDEV needs to square values, it 
requires double the precision not to overflow. So without changing the type 
here, this program would fail because of overflows for decimal values (prior to 
this PR such overflows were simply ignored, which is wrong).
   
   The problem of internal overflows could be solved in two ways:
   (1) change the internal implementation of all aggregation functions that 
could overflow to use a different precision for intermediate results. 
   (2) require the user to use a type wide enough to accommodate possible 
overflows
   
   Now, unfortunately the compiler cannot choose safely a type to guarantee 
that the results won't overflow, even for a simple aggregate such as AVG. The 
reason is that, although the compiler has a bound on the precision of the 
values aggregated, in general it has no bound on the number of rows in the 
table at runtime (especially for cases like incremental view maintenance, where 
the plans generated need to work even for future contents of the tables). So no 
matter what precision the compiler may choose for internal computations, this 
precision may be exceeded at runtime for some input data. 
   
   This leaves the other alternative: the user should choose a precision high 
enough to carry all computations. Although the user may also not know the size 
of the tables, at least the user has some domain-specific knowledge. So this PR 
adopts the approach in (2), which is also compatible with the existing Calcite 
implementation of aggregates.
   
   If we want to implement (1), it will require much more work, we have to 
modify the implementation of all Calcite aggregates, choosing an arbitrary type 
for intermediate results.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to