Phil Steitz wrote:
[EMAIL PROTECTED] wrote:

Phil Steitz wrote:

Since xbar = sum/n, the change has no impact on the which sums are computed or squared. Instead of (sum/n)*(sum/n)*n your change just computes sum**2/n. The difference is that you are a) eliminating one division by n and one multiplication by n (no doubt a good thing) and b) replacing direct multiplication with pow(-,2). The second of these used to be discouraged, but I doubt it makes any difference with modern compilers. I would suggest collapsing the denominators and doing just one cast -- i.e., use

(1) variance = sumsq - sum * (sum/(double) (n * (n - 1)))

If

(2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or

(3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give

better accuracy, use one of them; but I would favor (1) since it will be able to handle larger positive sums.

I would also recommend forcing getVariance() to return 0 if the result is negative (which can happen in the right circumstances for any of these formulas).

Phil



collapsing is definitely good, but I'm not sure about these equations, from my experience, approaching (2) would look something more like


variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1));

see (5) in http://mathworld.wolfram.com/k-Statistic.html


That formula is the formula for the 2nd k-statistic, which is *not* the same as the sample variance. The standard formula for the sample variance is presented in equation (3) here: http://mathworld.wolfram.com/SampleVariance.html or in any elementary statistics text. Formulas (1)-(3) above (and the current implementation) are all equivalent to the standard defintion.

What you have above is not. The relation between the variance and the second k-statistic is presented in (9) on http://mathworld.wolfram.com/k-Statistic.html

I just realized that I misread Wolfram's definitions. What he is defining as the 2nd k-statistic is the correct formula for the sample variance. I am also missing some parenthesis above. Your formula is correct. Sorry.


Phil



As you've stated, this approach seems to have more than just one benifit. I'll also place in a test for negitive values and return 0.0 if they are present.


-Mark


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to