[jira] [Commented] (HADOOP-18426) Improve the accuracy of MutableStat mean

ASF GitHub Bot (Jira) Mon, 29 Aug 2022 00:32:07 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597018#comment-17597018
 ]


ASF GitHub Bot commented on HADOOP-18426:
-----------------------------------------

Hexiaoqiao commented on code in PR #4811:
URL: https://github.com/apache/hadoop/pull/4811#discussion_r956979425


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/util/SampleStat.java:
##########
@@ -91,7 +91,7 @@ public SampleStat add(long nSamples, double x) {
     }
     else {
       // The Welford method for numerical stability
-      a1 = a0 + (x - a0) / numSamples;
+      a1 = a0 + (x - a0 * nSamples) / numSamples;

Review Comment:
   Great catch here. It makes sense to me.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/util/SampleStat.java:
##########
@@ -117,7 +117,7 @@ public double total() {
    * @return  the arithmetic mean of the samples
    */
   public double mean() {
-    return numSamples > 0 ? (total / numSamples) : 0.0;
+    return numSamples > 0 ? a1 : 0.0;

Review Comment:
   I am not sure why update here at HADOOP-13804. 
   @xkrogen would you mind to give another check?





> Improve the accuracy of MutableStat mean
> ----------------------------------------
>
>                 Key: HADOOP-18426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18426
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> The current MutableStat mean calculation method is more prone to loss 
> accuracy because the sum of samples is too large. We can process each sample 
> on its own to improve mean accuracy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18426) Improve the accuracy of MutableStat mean

Reply via email to