[
https://issues.apache.org/jira/browse/ARROW-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge closed ARROW-9779.
------------------------
Resolution: Won't Fix
There are other trade-offs here, and there is no consesus that this is worth.
Spark also uses sum / count.
> [Rust] [DataFusion] Increase stability of average accumulator
> -------------------------------------------------------------
>
> Key: ARROW-9779
> URL: https://issues.apache.org/jira/browse/ARROW-9779
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust, Rust - DataFusion
> Reporter: Jorge
> Assignee: Jorge
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Currently, our method to compute the average is based on:
> 1. compute sum of all terms
> 2. compute count of all terms
> 3. compute sum / count
> however, the sum may overflow.
> There is a typical solution to this based on an online formula described e.g.
> [here|http://www.heikohoffmann.de/htmlthesis/node134.html] to keep the
> numbers small.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)