Raul Miller wrote:
> On 6/27/07, John Randall <[EMAIL PROTECTED]> wrote:
>> Raul Miller wrote:
>> > For another, "standard deviation" jumps from a manageable quantity to
>> > an unknowable quantity when the result is determined, and it seems
>> > to me that it should be zero in that case.
>>
>> I think here you are confusing the random variable sample standard
>> deviation, which has a distribution, with the value of the sample
>> standard
>> deviation on a specific sample, which does not.
>
> If I understand you correctly, you're saying that it's not meaningful
> to talk about probabilities of 1?  (For example: no cumulative
> distribution across all sets.)
>
> I guess you could say that I'm uncomfortable working with a
> model of probability which imposes that limitation.

OK, you got me.  I am assuming that the distribution is not constant and
the sample size is greater than 1, which you need to estimate population
standard deviation.
>
>> > To my mind, the reasoning for the N-1 factor in standard deviation
>> > may be validly applied to the mean -- if I include all the deviation
>> > terms AND the deviation of the mean itself from itself, I should
>> > exclude the count of the mean in the divisor.  But no one bothers
>> > to express it that way, so this seems an exercise in futility.
> ...
>> Suppose you have a population with distribution f, mean mu and
>> standard deviation sigma.  ...
>> Now let S^2=((X1-mu)^2+...+(Xn-mu)^2)/n.  Then E(S^2)=sigma^2.
>>
>> The problem in the latter is that you do not know mu: you have to
>> estimate it from the sample.
>>
>> If you write (Xi-mu)^2=((Xi-M)+(M-mu))^2 and expand it out, you will
>> be able to eliminate mu from the sum
>
> Can I?
>
> I think I have to divide by M-mu, which would be bogus if it turned
> out that M=mu (In other words, if I were working with an accurate
> model).
>
> That said, I've not actually proven that there's no other way to
> work the math on this -- have I overlooked something?  [Let's
> limit this discussion to the cases where n=2, for now.]
>

Why limit it?  Here's the proof for general n, in LaTeX-like notation.


Let $\mu$ and $\sigma$ be the population mean and standard deviation.
Fix the sample size $n$. Let $\bar X$ be the sample mean, $S^2$ the
sample variance given by

$$ S^2=(1/n-1)\sum (X_i-\bar X)^2$$

It is elementary to show $E(\bar X)=\mu$.  We now show
$E(S^2)=\sigma^2$.

$\sum (X_i-\bar X)^2

=\sum ((X_i-\mu)-(\bar X -\mu))^2

=\sum ((X_i-\mu)^2) -n(\bar X-\mu)^2$,

since $\sum (X_i-\mu)=(\sum X_i) - n\mu = n(\bar X-\mu)$.

Then $E(S^2)=E((1/n-1)\sum (X_i-\bar X)^2

=(1/n-1)( \sum E((X_i-\mu)^2)-nE((\bar X-\mu)^2 )

=(1/n-1)( \sum \sigma^2(X_i)- n\sigma^2(\bar X) )$

But $\sigma^2(X_i)=\sigma^2$, $\sigma^2(\bar X)=\sigma^2/n$, so

$E(S^2)=(1/n-1)(n\sigma^2 -n(\sigma^2/n))=\sigma^2$.

Beest wishes,

John







----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to