[
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714573#comment-17714573
]
Benedict Elliott Smith edited comment on CASSANDRA-18470 at 4/20/23 12:35 PM:
------------------------------------------------------------------------------
I think this is ambiguous to be honest. In general we have very inadequately
both _considered_ and _documented_ our behaviour for these kinds of features
and data types. However, it is not immediately obvious this behaviour is
_incorrect_ since we do not ask the user to specify a level of precision of the
output, and since we support arbitrary precision we have to make some decision
based on the inputs, and in this case neither parameter has any fractional
component, so the result is rounded to the same.
There's an argument to be made that this is really inappropriate for an
aggregation, as the order in which values occur in the aggregation affects the
result. But I think the correct solution is probably to permit a precision to
be provided with the operator. We could plausibly also pick a default precision
that is non-zero, though this might constrain the precision below an acceptable
level for some workloads. We could permit the user to configure a default
precision for this operator, and/or use the default precision as a lower bound
only.
Probably our implementation is wrong, though, given this behaviour. It seems
that we assume we have good precision and therefore recompute the average on
each new datum, as opposed to maintaining a running sum and count. This would
also solve the problem of the order of provision modifying the output.
was (Author: benedict):
I think this is ambiguous to be honest. In general we have very inadequately
both _considered_ and _documented_ our behaviour for these kinds of features
and data types. However, it is not immediately obvious this behaviour is
_incorrect_ since we do not ask the user to specify a level of precision of the
output, and since we support arbitrary precision we have to make some decision
based on the inputs, and in this case neither parameter has any fractional
component, so the result is rounded to the same.
There's an argument to be made that this is really inappropriate for an
aggregation, as the order in which values occur in the aggregation affects the
result. But I think the correct solution is probably to permit a precision to
be provided with the operator. We could plausibly also pick a default precision
that is non-zero, though this might constrain the precision below an acceptable
level for some workloads. We could permit the user to configure a default
precision for this operator, and/or use the default precision as a lower bound
only.
> Average of "decimal" values rounds the average if all inputs are integers
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-18470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
> Project: Cassandra
> Issue Type: Bug
> Reporter: Nadav Har'El
> Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an
> arbitrary-precision number which may be an integer or fractional, but it is
> expected that the average would be, in general, fractional. But it turns out
> that if all the values are integer *without* a ".0", the aggregator sums them
> up as integers and the final division returns an integer too instead of the
> fractional response expected from a "decimal" value.
> For example:
> # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
> # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
> # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the
> average to be a "decimal", not a "varint", so there is no reason why it
> should be rounded up to be an integer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]