[math] UnivariateImpl - when sumsq ~ xbarxbar((double) n)

mdiggory Tue, 03 Jun 2003 11:48:56 -0700

My current work on Certified values exposes a limitation in calculating 
Variance in UnivariateImpl.


In the following case where this example data is used to induce roundoff 
error:

public double getVariance() {
   double variance = Double.NaN;
   if( n == 1 ) {
      variance = 0.0;
   } else if( n > 1 ) {
      double xbar = getMean();
      variance = sumsq - xbar*xbar*((double) n) /
                 ((double)(n-1));
   }
   return variance;
}

Basically when "sumsq" approaches "xbar*xbar*((double) n)". the 
numerator in the variance calc can become negative and the return from 
variance is negative, Math.sqrt then goes on to return a NaN. Of course, 
negative variances are not in the domain of possible return values for 
variance. The test I discover this on is in the NumAcc4.txt dataset that 
is now under /src/test/org/apache/commons/math/stat/data.

this data set basically looks like the following:

  10000000.2
  10000000.1
  10000000.3
  10000000.1
  10000000.3
  10000000.1
  10000000.3
  10000000.1
  10000000.3
  ...

I think that this different approach below handles extreme cases like this with 
more stability:

public double getVariance() {
   double variance = Double.NaN;

   if( n == 1 ) {
    variance = 0.0;
   } else if( n > 1 ) {
    variance =( sumsq - (Math.pow(sum,2)/(double)n ) / (double)(n-1);
   }
   return variance;
}

The roundoff error in "sum of squares" is always positively greater than 
that of the "square of the sum". Thus not inducing negative numbers and 
instability when dealing with extreme situations like this. I think that 
while one should try to avoid situations where roundoff errors arise, 
but dealing with such cases in a stable fashion should help to produce a 
robust algorithm.

I wanted to open this to discussion prior to committing this change.
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n)

Reply via email to

[math] UnivariateImpl - when sumsq ~ xbarxbar((double) n)