* Improve numerical accuracy of Univariate and BivariateRegression
statistical
computations. Encapsulate basic double[] |-> double mean, variance, min, max
computations using improved formulas and add these to MathUtils. (probably
should add float[], int[], long[] versions as well.) Then refactor all
univariate implementations that use stored values (including UnivariateImpl
with finite window) to use the improved versions. -- Mark?  I am chasing down
the TAS reference to document the source of the _NR_ formula, which I will
add
to the docs if someone else does the implementation.


I was starting to code the updating (storage-less) variance formula, based on
the Stanford article you cited, as a patch.  I believe the storage-using
corrected two-pass algorithm is pretty trivial to code once we feel we're on
solid ground with the reference to cite.


OK. I finally got hold of the American Statistician article (had to resort to the old trundle down to local university library method) and found lots of good stuff in it -- including a reference to Hanson's recursive formula (from Stanford paper) and some empirical and theoretical results confirming that NR 14.1.8 is about the best that you can do for the stored case. There is a refinement mentioned in which "pairwise summation" is used (essentially splitting the sample in two and computing the recursive sums in parallel); but the value of this only kicks in for large n. I propose that we use NR 14.1.8 as is for all stored computations. Here is good text for the reference:

Based on the <i>corrected two-pass algorithm</i> for computing the sample variance, as described in "Algorithms for Computing the Sample Variance: Analysis and Recommendations",Tony F Chan, Gene H. Golub and Randall J. LeVeque, <i>The American Statitistician</i>, 1983, Vol 37, No. 3. (Eq. (1.7) on page 243.)

The empirical investigation that the authors do uses the following trick that I have thought about using to investigate the precision in our stuff: implement an algorithm using both floats and doubles and use the double computations to assess stability of the algorithm implemented using floats. Might want to play with this a little.

Phil






--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to