Phil Steitz wrote: > Phil Steitz wrote: > >> [EMAIL PROTECTED] wrote: >> >>> Phil Steitz wrote: >>> >>>> Since xbar = sum/n, the change has no impact on the which sums are >>>> computed or squared. Instead of (sum/n)*(sum/n)*n your change just >>>> computes sum**2/n. The difference is that you are a) eliminating >>>> one division by n and one multiplication by n (no doubt a good >>>> thing) and b) replacing direct multiplication with pow(-,2). The >>>> second of these used to be discouraged, but I doubt it makes any >>>> difference with modern compilers. I would suggest collapsing the >>>> denominators and doing just one cast -- i.e., use >>>> >>>> (1) variance = sumsq - sum * (sum/(double) (n * (n - 1))) >>>> >>>> If >>>> >>>> (2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or >>>> >>>> (3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give >>>> >>>> better accuracy, use one of them; but I would favor (1) since it >>>> will be able to handle larger positive sums. >>>> >>>> I would also recommend forcing getVariance() to return 0 if the >>>> result is negative (which can happen in the right circumstances for >>>> any of these formulas). >>>> >>>> Phil >>> >>> >>> >>> >>> collapsing is definitely good, but I'm not sure about these >>> equations, from my experience, approaching (2) would look something >>> more like >>> >>> variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1)); >>> >>> see (5) in http://mathworld.wolfram.com/k-Statistic.html >> >> >> >> That formula is the formula for the 2nd k-statistic, which is *not* >> the same as the sample variance. The standard formula for the sample >> variance is presented in equation (3) here: >> http://mathworld.wolfram.com/SampleVariance.html or in any elementary >> statistics text. Formulas (1)-(3) above (and the current >> implementation) are all equivalent to the standard defintion. >> >> What you have above is not. The relation between the variance and the >> second k-statistic is presented in (9) on >> http://mathworld.wolfram.com/k-Statistic.html > > > I just realized that I misread Wolfram's definitions. What he is > defining as the 2nd k-statistic is the correct formula for the sample > variance. I am also missing some parenthesis above. Your formula is > correct. Sorry. > > Phil >
No problem, I just wanted to make sure we're all on the same page, there are always many times I am wrong too. Thanks for double checking, -Mark --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
