Raul Miller wrote: > On 6/27/07, John Randall <[EMAIL PROTECTED]> wrote: >> Raul Miller wrote: >> > For another, "standard deviation" jumps from a manageable quantity to >> > an unknowable quantity when the result is determined, and it seems >> > to me that it should be zero in that case. >> >> I think here you are confusing the random variable sample standard >> deviation, which has a distribution, with the value of the sample >> standard >> deviation on a specific sample, which does not. > > If I understand you correctly, you're saying that it's not meaningful > to talk about probabilities of 1? (For example: no cumulative > distribution across all sets.) > > I guess you could say that I'm uncomfortable working with a > model of probability which imposes that limitation.
OK, you got me. I am assuming that the distribution is not constant and the sample size is greater than 1, which you need to estimate population standard deviation. > >> > To my mind, the reasoning for the N-1 factor in standard deviation >> > may be validly applied to the mean -- if I include all the deviation >> > terms AND the deviation of the mean itself from itself, I should >> > exclude the count of the mean in the divisor. But no one bothers >> > to express it that way, so this seems an exercise in futility. > ... >> Suppose you have a population with distribution f, mean mu and >> standard deviation sigma. ... >> Now let S^2=((X1-mu)^2+...+(Xn-mu)^2)/n. Then E(S^2)=sigma^2. >> >> The problem in the latter is that you do not know mu: you have to >> estimate it from the sample. >> >> If you write (Xi-mu)^2=((Xi-M)+(M-mu))^2 and expand it out, you will >> be able to eliminate mu from the sum > > Can I? > > I think I have to divide by M-mu, which would be bogus if it turned > out that M=mu (In other words, if I were working with an accurate > model). > > That said, I've not actually proven that there's no other way to > work the math on this -- have I overlooked something? [Let's > limit this discussion to the cases where n=2, for now.] > Why limit it? Here's the proof for general n, in LaTeX-like notation. Let $\mu$ and $\sigma$ be the population mean and standard deviation. Fix the sample size $n$. Let $\bar X$ be the sample mean, $S^2$ the sample variance given by $$ S^2=(1/n-1)\sum (X_i-\bar X)^2$$ It is elementary to show $E(\bar X)=\mu$. We now show $E(S^2)=\sigma^2$. $\sum (X_i-\bar X)^2 =\sum ((X_i-\mu)-(\bar X -\mu))^2 =\sum ((X_i-\mu)^2) -n(\bar X-\mu)^2$, since $\sum (X_i-\mu)=(\sum X_i) - n\mu = n(\bar X-\mu)$. Then $E(S^2)=E((1/n-1)\sum (X_i-\bar X)^2 =(1/n-1)( \sum E((X_i-\mu)^2)-nE((\bar X-\mu)^2 ) =(1/n-1)( \sum \sigma^2(X_i)- n\sigma^2(\bar X) )$ But $\sigma^2(X_i)=\sigma^2$, $\sigma^2(\bar X)=\sigma^2/n$, so $E(S^2)=(1/n-1)(n\sigma^2 -n(\sigma^2/n))=\sigma^2$. Beest wishes, John ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
