Re: [math] UnivariateImpl - when sumsq ~ xbarxbar((double) n)

Mark R. Diggory Wed, 04 Jun 2003 01:08:49 -0700

Al Chou wrote:

Uh, did we drop the idea of using the "corrected two-pass" algorithm for the
variance in the non-rolling case?  I excerpted that thread below.

Al

No, we definitely should plan to include this into StoredUnivariate (and when UnivariateImpl requires storage). I've been wondering if there was a way to combine the two approaches for the rolling implementation. I think what we're beginning to see is that there are many different benefits/limitations for the different implementations of Univariate (Rolling vs. Stored). I think what we'll find is that:

(1) UnivariateImpl is excellent for memory usage, with some limitations in accuracy due to roundoff error with extreme cases.

(2) StoredUnivariateImpl can support more accurate algorithms that reduce roundoff error, but with the trade off of greater memory requirements.

Thanks for bringing the subject up again.
-Mark

Date: Mon, 26 May 2003 18:28:45 -0700 (PDT)
From: Al Chou <[EMAIL PROTECTED]>
Subject: [math] rolling formulas for statistics (was RE: Greetings from a
newcomer (b
--- Phil Steitz <[EMAIL PROTECTED]> wrote:
Al Chou wrote:
--- Phil Steitz <[EMAIL PROTECTED]> wrote:
Brent Worden wrote:
Mark R. Diggory wrote:
There are easy formulas for skewness and kurtosis based on the central
moments which could be used for the stored, univariate implementations:
http://mathworld.wolfram.com/Skewness.html
http://mathworld.wolfram.com/Kurtosis.html
As for the rolling implementations, there might be some more research involved before using this method because of their memoryless property.
But

for starters, the sum and sumsq can easily be replaced with there central moment counterparts, mean and variance. There are formulas that update

those

metrics when a new value is added. Weisberg's "Applied Linear Regression" outlines two such updating formulas for mean and sum of squares which are numerically superior to direct computation and the raw moment methods.

Why exactly, are these numerically superior? For what class of examples? Looks like lots more operations to me, especially in the UnivariateImpl case where the mean, variance are computed only when demanded -- i.e., there is no requirement to generate mean[0], mean[1],...etc. I understand that adding (or better, swapping) operations can sometimes add precision, but I am having a hard time seeing exactly where the benefit is in this case, especially given the amount of additional computation required.
_Numerical Recipes in C_, 2nd ed. p. 613 (http://lib-www.lanl.gov/numerical/bookcpdf/c14-1.pdf) explains: "Many textbooks use the binomial theorem to expand out the definitions [of statistical quantities] into sums of various powers of the data, ...[,] but this can magnify the roundoff error by a large factor... . A clever way to minimize roundoff error, especially for large samples, is to use the
corrected

two-pass algorithm [1]: First calculate x[bar, the mean of x], then

calculate

[formula for variance in terms of x - xbar.] ..."

[1] "Algorithms for Computing the Sample Variance: Analysis and Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. 1983, American Statistician, vol. 37, pp. 242?247.

Thany you, Al!

I am convinced by this for that at least for the variance and higher order moments, whenever we actually store the values (that would include UnivariateImpl with finite window), we should use the "corrected one [sic] pass" formula (14.1.8) from http://lib-www.lanl.gov/numerical/bookcpdf/c14-1.pdf. It is a clever idea to explicitly correct for error in the mean.
=====
Albert Davidson Chou
Get answers to Mac questions at http://www.Mac-Mgrs.org/ .
__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n)

Reply via email to

Re: [math] UnivariateImpl - when sumsq ~ xbarxbar((double) n)