On 6/29/07, John Randall <[EMAIL PROTECTED]> wrote:
Raul Miller wrote:
> On 6/29/07, John Randall <[EMAIL PROTECTED]> wrote:

>> Why?  This shows that the method you are using does not produce an
>> unbiased estimator of variance, but mine does.
>
> This terminology "biased" vs. "unbiased" does not appear to be relevant
> when talking about population variance.

True, but it does with estimators: an unbiased estimator of a parameter is
a statistic whose expected value is that parameter.

I do not know how to evaluate this claim, numerically.

I would much rather be discussing issues such as:

 I may have made a mistake -- on my last calculation (t7).
 The assertion is
   $E(S^2)=(1/n-1)(n\sigma^2 -n(\sigma^2/n))=\sigma^2$.

 But I believe n(\sigma^2/n) is \sigma^2 and I cannot find any
 interpretation of $E(S^2)=(1/n-1)(n\sigma^2 -(\sigma^2)) that
 seems numerically valid.
...
 And, I also note that I did not produce any calculations corresponding
 to the previous line, which has the corresponding assertion
   $\sigma^2(\bar X)=\sigma^2/n$

 which worries me.

 It seems to me that \sigma^2(X_i) is 0.25 which means I should expect
 that \sigma^2/n is 0.125.  But from previous calculations, I expect that
 \sigma^2(bar X) is 0.125 which means that either I have woefully
 misunderstood some of the notation, or that the assertion
   $\sigma^2(\bar X)=\sigma^2/n$

 must be false.

>> What is a sample which does not represent the population?
>
> The issue seems to be completeness.
>
> The "unbiased" estimator traditionally gets used when dealing with
> variance for a sample which is not identical to the population.  In
> other words, if there's any possibility that the distribution of the
> sample is different from the distribution of the population, it seems
> to be traditional to use the "unbiased estimator" (n/n-1 when determining
> RMS deviation rather than the "biased estimator" of 1 when determining
> RMS deviation).

I don't understand this.  A random sample is defined to be a list of
independent random variables each with the same distribution as the
population.  What does "a sample which is not identical to the population"
mean?

The distribution of the random variables which were used to generate
the sample is not, in the general case, the same as the distribution
of an instance of a sample.

For example, let's consider our simple case: we count the number
of times "heads" comes up, when we  flip a fair coin.

The distribution of the population is:
  0: 50%
  1: 50%

However, any single random sample containing three values
cannot represent this distribution.  The potential samples are:
  0 0 0   (0: 100%,  1: 0%)
  0 0 1   (0: 66.67% 1: 33.33%)
etc.

Likewise, in a more complicated case -- we roll a die and determine
which number is face up -- then any number of samples less than
the number of distinct elements in the population (six) does not have
a sample distribution which can match the population distribution.
At least one value [which is present in the population] must be
missing from each individual sample.

(And, of course, even when your sample size is an exact multiple
of the number of distinct elements in the population as a whole,
that still does not guarantee that your sample distribution matches
the population distribution.)

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to