OK. I finally got hold of the American Statistician article (had to resort to the old trundle down to local university library method) and found lots of good stuff in it -- including a reference to Hanson's recursive formula (from Stanford paper) and some empirical and theoretical results confirming that NR 14.1.8 is about the best that you can do for the stored case. There is a refinement mentioned in which "pairwise summation" is used (essentially splitting the sample in two and computing the recursive sums in parallel); but the value of this only kicks in for large n. I propose that we use NR 14.1.8 as is for all stored computations. Here is good text for the reference:* Improve numerical accuracy of Univariate and BivariateRegression statistical computations. Encapsulate basic double[] |-> double mean, variance, min, max computations using improved formulas and add these to MathUtils. (probably should add float[], int[], long[] versions as well.) Then refactor all univariate implementations that use stored values (including UnivariateImpl with finite window) to use the improved versions. -- Mark? I am chasing down the TAS reference to document the source of the _NR_ formula, which I will add to the docs if someone else does the implementation.
I was starting to code the updating (storage-less) variance formula, based on the Stanford article you cited, as a patch. I believe the storage-using corrected two-pass algorithm is pretty trivial to code once we feel we're on solid ground with the reference to cite.
Based on the <i>corrected two-pass algorithm</i> for computing the sample variance, as described in "Algorithms for Computing the Sample Variance: Analysis and Recommendations",Tony F Chan, Gene H. Golub and Randall J. LeVeque, <i>The American Statitistician</i>, 1983, Vol 37, No. 3. (Eq. (1.7) on page 243.)
The empirical investigation that the authors do uses the following trick that I have thought about using to investigate the precision in our stuff: implement an algorithm using both floats and doubles and use the double computations to assess stability of the algorithm implemented using floats. Might want to play with this a little.
Phil
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]