[
https://issues.apache.org/jira/browse/CALCITE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826852#comment-17826852
]
Mihai Budiu commented on CALCITE-6324:
--------------------------------------
Indeed, it is not clear what the precision and scale of the result should be
for some of these functions.
The precision required to fit the result depends not only on the value type,
but also on the table size.
If we leave the current type inference rules for these functions a program like
the the following:
{code:sql}
select var_samp(sal) from emp;
{code}
which appears in redshift.iq, will fail with a runtime error because the result
~1398313.87 cannot be represented in the type DECIMAL(7, 2), the type of
EMP.SAL.
SQL server uses a higher precision even for short integer types for AVG:
https://learn.microsoft.com/en-us/sql/t-sql/functions/avg-transact-sql?view=sql-server-ver16#return-types
SQL server also returns FLOAT for the statistical aggregates.
Postgres seems to use an "arbitrary precision" decimal for these values
(Postgres DECIMALS can have up to billions of digits), based on testing.
Oracle uses for the result the same type as the source data type:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/STDDEV.html
In my PR I have used DECIMAL(2*precision, 2*scale) for the result for this
class of functions, where DECIMAL(precision, scale) is the precision of the
source data.
I think that a larger result type is more user-friendly, since fewer programs
will surprisingly fail at runtime if people forget to cast the data.
If we keep the current type for the results we will have to adjust the
accumulator types and change quite a few test programs to insert explicit casts
before the aggregation. (I will have to check, we may need to change the
accumulator types anyway, didn't look to see exactly how the evaluation is
done.)
Happy for a suggestion to solve this.
> Type inferred for result of STDDEV, VAR_SAMP, etc. is incorrect
> ---------------------------------------------------------------
>
> Key: CALCITE-6324
> URL: https://issues.apache.org/jira/browse/CALCITE-6324
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.36.0
> Reporter: Mihai Budiu
> Assignee: Mihai Budiu
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.37.0
>
>
> These functions are all use the same type inference algorithm, essentially
> the algorithm used by AVG.
> But if the values processed are decimal, STDDEV (and others) need much higher
> precision to represent the result. (I am not sure that the inference is right
> for integer types either, btw.)
> This surfaced during the implementation of a fix for [CALCITE-6322]: if we
> use the type inferred for these functions, the result overflows and causes a
> runtime exception.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)