Abacn commented on code in PR #29808:
URL: https://github.com/apache/beam/pull/29808#discussion_r1437249953
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/util/HistogramData.java:
##########
@@ -231,6 +239,38 @@ private synchronized void updateStatistics(double value) {
sumOfSquaredDeviations += (value - mean) * (value - oldMean);
}
+ /**
+ * Increment the {@code numTopRecords} and update {@code topRecordsMean}
when a new overflow value
+ * is recorded. This function should only be called when a Histogram is
recording a value greater
+ * than the upper bound of it's largest bucket.
+ *
+ * @param value
+ */
+ private synchronized void recordTopRecordsValue(double value) {
+ numTopRecords++;
+ if (numTopRecords == 1) {
+ topRecordsMean = value;
+ } else {
+ topRecordsMean = topRecordsMean + (value - topRecordsMean) /
numTopRecords;
Review Comment:
Can we simply add up value every time, that is use "topRecordSum", and
calculate the mean by sum/counter when needed. As of floating point
arithmetics, this formula can lose precision over time. Also division is more
expensive than addition
##########
sdks/java/core/src/test/java/org/apache/beam/sdk/util/HistogramDataTest.java:
##########
@@ -357,27 +357,74 @@ public void testStatistics_sumOfSquaredDeviations() {
@Test
public void testGetAndReset_resetSucceeds() {
- HistogramData originalHistogram = HistogramData.linear(0, 10, 10);
+ // Records values from [20, 50) in three buckets that have width 10.
+ HistogramData originalHistogram = HistogramData.linear(20, 10, 3);
originalHistogram.record(15.0, 25.0, 35.0, 45.0);
originalHistogram.getAndReset();
- HistogramData emptyHistogramData = HistogramData.linear(0, 10, 10);
+ HistogramData emptyHistogramData = HistogramData.linear(20, 10, 3);
Review Comment:
Is there a reason that needs to change these parameters? The original
spacing should still pass the test, If not does it indicate some issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]