[ https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597018#comment-17597018 ]
ASF GitHub Bot commented on HADOOP-18426: ----------------------------------------- Hexiaoqiao commented on code in PR #4811: URL: https://github.com/apache/hadoop/pull/4811#discussion_r956979425 ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/util/SampleStat.java: ########## @@ -91,7 +91,7 @@ public SampleStat add(long nSamples, double x) { } else { // The Welford method for numerical stability - a1 = a0 + (x - a0) / numSamples; + a1 = a0 + (x - a0 * nSamples) / numSamples; Review Comment: Great catch here. It makes sense to me. ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/util/SampleStat.java: ########## @@ -117,7 +117,7 @@ public double total() { * @return the arithmetic mean of the samples */ public double mean() { - return numSamples > 0 ? (total / numSamples) : 0.0; + return numSamples > 0 ? a1 : 0.0; Review Comment: I am not sure why update here at HADOOP-13804. @xkrogen would you mind to give another check? > Improve the accuracy of MutableStat mean > ---------------------------------------- > > Key: HADOOP-18426 > URL: https://issues.apache.org/jira/browse/HADOOP-18426 > Project: Hadoop Common > Issue Type: Bug > Reporter: Shuyan Zhang > Assignee: Shuyan Zhang > Priority: Major > Labels: pull-request-available > > The current MutableStat mean calculation method is more prone to loss > accuracy because the sum of samples is too large. We can process each sample > on its own to improve mean accuracy. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org