Re: [Jgeneral] "Standard Deviation"

Raul Miller Fri, 29 Jun 2007 10:38:02 -0700

On 6/29/07, John Randall <[EMAIL PROTECTED]> wrote:

Raul Miller wrote:
> Sure -- that's why I called that numerical model a way of "checking
> my work" instead of a "proof".  But, if the math is valid, then the math
> should remain valid when I plug in the numbers.


This is where the problem lies.

Let S^2=(1/n)\sum (X_i-\mu)^2.  Then E(S^2)=\sigma^2, the population
variance.

However, if you replace \mu by \bar X, it is not true that

E((1/n)\sum (X_i-\bar X)^2=\sigma^2.

You are assuming that E((X-Y)^2)=E(X-E(Y))^2, which is false.  Assuming
this is equivalent to saying E(Y^2)=E(Y)^2.


Ok, it looks like my implementation was incorrect.

I had:
 NB. $=(1/n-1)( \sum E((X_i-\mu)^2)-nE((\bar X-\mu)^2 ))$
 t5=:(%n1)*+/"1 mean ([:*:(-mean))"1 samples

But that code clearly does not match the underlying concept.  As you
point out, I do not use mu at all here.

However, if I replace that t5=: line with
 t5=:(%n1)* ( (+/n#mean *:mu-~y)) - n*([:mean ([:*:mu-~mean)"1) samples

the assertions are still valid.

Moreover, they are valid both in the version for your proof, and in my
variant where n1=:n-1 and s0=:s1

At least, this is the case for every example I have tried.

> In this case, my "samples" precisely represent the entire population.
>
> For example, let's consider your hypothetical case of number of
> heads from a coin toss.
>
> With a fair coin, the population, with distribution is:
>    0: 50%
>    1: 50%
>
> The possible samples for a sample size of 2 are then
>    0 0: 25%
>    0 1: 25%
>    1 0: 25%
>    1 1: 25%

...

> Thus, for E(\sum (X_i-\bar X)^2)=\sum (x_i-\bar x)^2

I still don't get this.  I am using the notation x_i for an actual sample
value.  The left hand side is a number.  The right hand side will vary
based on the actual sample chosen.


Ok... I don't remember why I wrote that final quoted line the
way I did.  I thought I had copied and pasted in a line from
your proof that had the term E(\sum (X_i-\bar X)^2), but it's
clear that I did not.

Anyways, I should not have included =\sum (x_i-\bar x)^2
as that is irrelevant to my point, and does not accurately
reflect the concepts I am working with.

> I can determine \sum(X_i-\bar X)^2 for each of those
> potential sample cases (0, 0.5, 0.5, 0) and then
> average them.  For the fair coin, this average is
> 0.25.  For that unfair coin, I get 0.1875 for this
> average.

This precisely illustrates the point.  The population variance is 3%8,
not 3%16.  You can see in this case that using denominator n=2
gives the wrong answer, using denominator n-1 gives the correct
answer.


I am not sure this is a valid point.

See 
http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_standard_deviation_from_sample_standard_deviation
as an example treatment which seems to conflict with the assertions
you have advanced in this last paragraph.

As I understand the wiki write up, even if you consider denominator
n=1 as valid for samples which do not represent the population,
you must still use denominator n=2 when you are dealing with
the population distribution.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jgeneral] "Standard Deviation"

Reply via email to