Phil Steitz wrote:
> Phil Steitz wrote:
> 
>> [EMAIL PROTECTED] wrote:
>>
>>> Phil Steitz wrote:
>>>
>>>> Since xbar = sum/n, the change has no impact on the which sums are 
>>>> computed or squared. Instead of (sum/n)*(sum/n)*n your change just 
>>>> computes sum**2/n.  The difference is that you are a) eliminating 
>>>> one division by n and one multiplication by n (no doubt a good 
>>>> thing) and b) replacing direct multiplication with pow(-,2). The 
>>>> second of these used to be discouraged, but I doubt it makes any 
>>>> difference with modern compilers.  I would suggest collapsing the 
>>>> denominators and doing just one cast -- i.e., use
>>>>
>>>> (1) variance = sumsq - sum * (sum/(double) (n * (n - 1)))
>>>>
>>>> If
>>>>
>>>> (2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or
>>>>
>>>> (3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give
>>>>
>>>> better accuracy, use one of them; but I would favor (1) since it 
>>>> will be able to handle larger positive sums.
>>>>
>>>> I would also recommend forcing getVariance() to return 0 if the 
>>>> result is negative (which can happen in the right circumstances for 
>>>> any of these formulas).
>>>>
>>>> Phil
>>>
>>>
>>>
>>>
>>> collapsing is definitely good, but I'm not sure about these 
>>> equations, from my experience, approaching (2) would look something 
>>> more like
>>>
>>> variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1));
>>>
>>> see (5) in http://mathworld.wolfram.com/k-Statistic.html
>>
>>
>>
>> That formula is the formula for the 2nd k-statistic, which is *not* 
>> the same as the sample variance.  The standard formula for the sample 
>> variance is presented in equation (3) here: 
>> http://mathworld.wolfram.com/SampleVariance.html or in any elementary 
>> statistics text. Formulas (1)-(3) above (and the current 
>> implementation) are all equivalent to the standard defintion.
>>
>> What you have above is not.  The relation between the variance and the 
>> second k-statistic is presented in (9) on 
>> http://mathworld.wolfram.com/k-Statistic.html
> 
> 
> I just realized that I misread Wolfram's definitions. What he is 
> defining as the 2nd k-statistic is the correct formula for the sample 
> variance.  I am also missing some parenthesis above.  Your formula is 
> correct.  Sorry.
> 
> Phil
> 

No problem, I just wanted to make sure we're all on the same page, there are 
always many times I am wrong too. 

Thanks for double checking,
-Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to