aherbert commented on code in PR #51: URL: https://github.com/apache/commons-statistics/pull/51#discussion_r1283226425
########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/FirstMoment.java: ########## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +/** + * Computes the first moment (arithmetic mean) using the definitional formula: + * + * <p> mean = sum(x_i) / n + * + * <p> To limit numeric errors, the value of the statistic is computed using the + * following recursive updating algorithm: + * <ol> + * <li>Initialize <code>m = </code> the first value</li> + * <li>For each additional value, update using <br> + * <code>m = m + (new value - m) / (number of observations)</code></li> + * </ol> + * + * <p>Returns <code>Double.NaN</code> if the dataset is empty. Note that + * <code>NaN</code> may also be returned if the input includes NaN and / or infinite + * values of opposite sign. + * + * <p><strong>Note that this implementation is not synchronized.</strong> If + * multiple threads access an instance of this class concurrently, and at least + * one of the threads invokes the <code>increment()</code> or + * <code>clear()</code> method, it must be synchronized externally. + * + * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code> + * as <code>accumulator</code> and <code>combiner</code> functions of + * {@link java.util.stream.Collector Collector} on a parallel stream, + * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()} + * provides the necessary partitioning, isolation, and merging of results for + * safe and efficient parallel execution. + */ +class FirstMoment implements DoubleStatistic, DoubleStatisticAccumulator<FirstMoment> { + /** Count of values that have been added. */ + private long n; + + /** First moment of values that have been added. */ + private double m1; + + /** Running count of <code>Double.POSITIVE_INFINITY</code> values. */ + private long posInfCount; Review Comment: These counts are not required. Having to track these counters adds branch points to the code that are not required. I understand what you are trying to do by tracking if the result has a accumulated opposite infinities. However this can be done by just summing the values up. I added a double to track the sum of the current values. If m1 is not finite then some part of the computation encountered a non-finite value. You can then return the sum as it will not be finite and in this case the non-finite sum result is the same as the mean. All we have to do is ensure that m1 never overflows unless a non-finite value is encountered. So we require a test that combine two instances that are currently +MAX_VALUE and -MIN_VALUE to check this is correctly handled. Here is the implementation that passes all your current tests: ```java class FirstMoment implements DoubleStatistic, DoubleStatisticAccumulator<FirstMoment> { private long n; private double sum; private double m1; private double dev; private double nDev; FirstMoment() {} FirstMoment(final double m1, final long n) { this.m1 = m1; this.n = n; } public void accept(double value) { n++; sum += value; // prevent overflow at the cost of precision on sub-normals dev = (value * 0.5 - m1 * 0.5) * 2; nDev = dev / n; m1 += nDev; } public double getAsDouble() { if (Double.isFinite(m1)) { return n == 0 ? Double.NaN : m1; } return sum; } public FirstMoment combine(FirstMoment other) { if (n == 0) { n = other.n; m1 = other.m1; sum = other.sum; // copy dev and nDev - not currently required } else if (other.n != 0) { n += other.n; sum += other.sum; // prevent overflow at the cost of precision on sub-normals dev = (other.m1 * 0.5 - m1 * 0.5) * 2; nDev = dev * ((double) other.n / n); m1 += nDev; } return this; } long getN() { return n; } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
