[jira] [Comment Edited] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

Benedict Elliott Smith (Jira) Thu, 20 Apr 2023 05:36:07 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714573#comment-17714573
 ]


Benedict Elliott Smith edited comment on CASSANDRA-18470 at 4/20/23 12:35 PM:
------------------------------------------------------------------------------

I think this is ambiguous to be honest. In general we have very inadequately 
both _considered_ and _documented_ our behaviour for these kinds of features 
and data types. However, it is not immediately obvious this behaviour is 
_incorrect_ since we do not ask the user to specify a level of precision of the 
output, and since we support arbitrary precision we have to make some decision 
based on the inputs, and in this case neither parameter has any fractional 
component, so the result is rounded to the same.

There's an argument to be made that this is really inappropriate for an 
aggregation, as the order in which values occur in the aggregation affects the 
result. But I think the correct solution is probably to permit a precision to 
be provided with the operator. We could plausibly also pick a default precision 
that is non-zero, though this might constrain the precision below an acceptable 
level for some workloads. We could permit the user to configure a default 
precision for this operator, and/or use the default precision as a lower bound 
only.

Probably our implementation is wrong, though, given this behaviour. It seems 
that we assume we have good precision and therefore recompute the average on 
each new datum, as opposed to maintaining a running sum and count. This would 
also solve the problem of the order of provision modifying the output.


was (Author: benedict):
I think this is ambiguous to be honest. In general we have very inadequately 
both _considered_ and _documented_ our behaviour for these kinds of features 
and data types. However, it is not immediately obvious this behaviour is 
_incorrect_ since we do not ask the user to specify a level of precision of the 
output, and since we support arbitrary precision we have to make some decision 
based on the inputs, and in this case neither parameter has any fractional 
component, so the result is rounded to the same.

There's an argument to be made that this is really inappropriate for an 
aggregation, as the order in which values occur in the aggregation affects the 
result. But I think the correct solution is probably to permit a precision to 
be provided with the operator. We could plausibly also pick a default precision 
that is non-zero, though this might constrain the precision below an acceptable 
level for some workloads. We could permit the user to configure a default 
precision for this operator, and/or use the default precision as a lower bound 
only.

> Average of "decimal" values rounds the average if all inputs are integers
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Nadav Har'El
>            Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an 
> arbitrary-precision number which may be an integer or fractional, but it is 
> expected that the average would be, in general, fractional. But it turns out 
> that if all the values are integer *without* a ".0", the aggregator sums them 
> up as integers and the final division returns an integer too instead of the 
> fractional response expected from a "decimal" value.
> For example:
>  # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
>  # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
>  # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
> average to be a "decimal", not a "varint", so there is no reason why it 
> should be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

Reply via email to