aherbert commented on code in PR #51:
URL: https://github.com/apache/commons-statistics/pull/51#discussion_r1283226425


##########
commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/FirstMoment.java:
##########
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.statistics.descriptive;
+
+/**
+ * Computes the first moment (arithmetic mean) using the definitional formula:
+ *
+ * <p> mean = sum(x_i) / n
+ *
+ * <p> To limit numeric errors, the value of the statistic is computed using 
the
+ * following recursive updating algorithm:
+ * <ol>
+ * <li>Initialize <code>m = </code> the first value</li>
+ * <li>For each additional value, update using <br>
+ *   <code>m = m + (new value - m) / (number of observations)</code></li>
+ * </ol>
+ *
+ * <p>Returns <code>Double.NaN</code> if the dataset is empty. Note that
+ * <code>NaN</code> may also be returned if the input includes NaN and / or 
infinite
+ *  values of opposite sign.
+ *
+ * <p><strong>Note that this implementation is not synchronized.</strong> If
+ * multiple threads access an instance of this class concurrently, and at least
+ * one of the threads invokes the <code>increment()</code> or
+ * <code>clear()</code> method, it must be synchronized externally.
+ *
+ * <p>However, it is safe to use <code>accept()</code> and 
<code>combine()</code>
+ * as <code>accumulator</code> and <code>combiner</code> functions of
+ * {@link java.util.stream.Collector Collector} on a parallel stream,
+ * because the parallel implementation of {@link 
java.util.stream.Stream#collect Stream.collect()}
+ * provides the necessary partitioning, isolation, and merging of results for
+ * safe and efficient parallel execution.
+ */
+class FirstMoment implements DoubleStatistic, 
DoubleStatisticAccumulator<FirstMoment> {
+    /** Count of values that have been added. */
+    private long n;
+
+    /** First moment of values that have been added. */
+    private double m1;
+
+    /** Running count of <code>Double.POSITIVE_INFINITY</code> values. */
+    private long posInfCount;

Review Comment:
   These counts are not required. Having to track these counters adds branch 
points to the code that are not required. I understand what you are trying to 
do by tracking if the result has a accumulated opposite infinities. However 
this can be done by just summing the values up.
   
   I added a double to track the sum of the current values. If m1 is not finite 
then some part of the computation encountered a non-finite value. You can then 
return the sum as it will not be finite and in this case the non-finite sum 
result is the same as the mean.
   
   All we have to do is ensure that m1 never overflows unless a non-finite 
value is encountered. So we require a test that combine two instances that are 
currently +MAX_VALUE and -MIN_VALUE to check this is correctly handled.
   
   Here is the implementation that passes all your current tests:
   ```java
   class FirstMoment implements DoubleStatistic, 
DoubleStatisticAccumulator<FirstMoment> {
       private long n;
       private double sum;
       private double m1;
       private double dev;
       private double nDev;
   
       FirstMoment() {}
   
       FirstMoment(final double m1, final long n) {
           this.m1 = m1;
           this.n = n;
       }
   
       public void accept(double value) {
           n++;
           sum += value;
           // prevent overflow at the cost of precision on sub-normals
           dev = (value * 0.5 - m1 * 0.5) * 2;
           nDev = dev / n;
           m1 += nDev;
       }
   
       public double getAsDouble() {
           if (Double.isFinite(m1)) {
               return n == 0 ? Double.NaN : m1;
           }
           return sum;
       }
   
       public FirstMoment combine(FirstMoment other) {
           if (n == 0) {
               n = other.n;
               m1 = other.m1;
               sum = other.sum;
               // copy dev and nDev - not currently required
           } else if (other.n != 0) {
               n += other.n;
               sum += other.sum;
               // prevent overflow at the cost of precision on sub-normals
               dev = (other.m1 * 0.5 - m1 * 0.5) * 2;
               nDev = dev * ((double) other.n / n);
               m1 += nDev;
           }
           return this;
       }
   
       long getN() {
           return n;
       }
   }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to