pitrou commented on PR #44184: URL: https://github.com/apache/arrow/pull/44184#issuecomment-2875691410
> If I understand the code correctly, what happens for the mean for decimals is that it first computes the sum by using the underlying `+` of the Decimal type, then computes the count of values as a long, and does an integer division of the two. As seen in this commit, that `+` operator essentially ignores precision. I think this is a reasonable implementation except: > * There should be a checked variant; > * Potentially, you'd want to add some extra scale, in line with the binary division operation. Ideally, the scale should be "floating" just as in floating-point arithmetic, depending on the current running sum (the running sum can be very large if all data is positive, or very small if the data is centered around zero). It is then normalized to the original scale at the end. But of course that makes the algorithm more involved. (that would also eliminate the need for a checked variant?) > So I'd propose for product either demoting to double (will give a semantically correct value but lose precision) or maxing out the width of the decimal type. Of course, there should also be a checked variant (but I'm deferring that until after this commit is complete). Either is fine to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org