pitrou commented on PR #44184:
URL: https://github.com/apache/arrow/pull/44184#issuecomment-2875691410

   > If I understand the code correctly, what happens for the mean for decimals 
is that it first computes the sum by using the underlying `+` of the Decimal 
type, then computes the count of values as a long, and does an integer division 
of the two. As seen in this commit, that `+` operator essentially ignores 
precision. I think this is a reasonable implementation except:
   >     * There should be a checked variant;
   >     * Potentially, you'd want to add some extra scale, in line with the 
binary division operation.
   
   Ideally, the scale should be "floating" just as in floating-point 
arithmetic, depending on the current running sum (the running sum can be very 
large if all data is positive, or very small if the data is centered around 
zero). It is then normalized to the original scale at the end. But of course 
that makes the algorithm more involved.
   
   (that would also eliminate the need for a checked variant?)
   
   > So I'd propose for product either demoting to double (will give a 
semantically correct value but lose precision) or maxing out the width of the 
decimal type. Of course, there should also be a checked variant (but I'm 
deferring that until after this commit is complete).
   
   Either is fine to me. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to