Raul Miller wrote:
> That said... I want to be able to talk about populations, distributions,
> samples, etc. using a consistent set of terminology regardless of
> whether or not the distribution of the population is known or unknown,
> or partially understood.

The standard terminology covers all these cases, although there are
some ambiguities.  First note that a sample space is just a set of
outcomes: is is different from a sample, which is a list of random
variables.

A population is just something with a distribution, a probability
function f, which simply means f(x)>:0 for all x and \sum f(x)=1.

The population variance exists whether we know the population
distribution or not, whether we have an estimator or not, and is a
number.  If the distribution has probability function f, and X is a
random variable having distribution f, then if E(X)=mu,  the
population variance is defined as

\sigma^2=E((X-mu)^2)=E(X^2)-E(X)^2,

where mu=E(X)=\sum x f(x), and E(X^2)=\sum x^2 f(x).  Each of these is
a number.


To calculate \sigma^2, you have to know f, but it is defined in any
case. The purpose of estimators is to get estimates for population
parameters when you do not know f.

A sample of size n is (as I have previously described) a list of n
independent variables, each of whose distribution is f, and a
statistic is a function of these variables.

To get around the terminology difficulties, suppose we have a
statistic A(X1,...,An).  This is a random variable, and so has an
expected value.  We say A is an unbiased estimator of \sigma^2 if

E(A)=\sigma^2.

We then conduct a statistical experiment and evaluate X1,...,Xn on the
outcome to give numbers x1,...xn, the value of the random sample.  Then
A(x1,...,xn) gives an a number which is an estimate for \sigma^2.  If
we know the distribution of A, we can also get a confidence interval.

All that I am asserting is that if

A(X1,...,Xn)=(1/n-1) \sum (Xi-\bar X)^2

Then E(A)=\sigma^2, and the right hand side of the expression for A is
called the sample variance. There is a some ambiguity here: the sample
variance is used to refer to either

S^2=(1/n-1) \sum (Xi-\bar X)^2  (a random variable) or

s^2=(1/n-1) \sum (xi-\bar x)^2  (the value of this on a particular
sample).

There is a typographical convention that distinguishes these cases:
population parameters are lower-case greek letters, random variables
are upper-case Roman letters, and the values of random variables are
lower-case Roman letters.

However, I believe you are using the words sample and sampling in a
nonstandard sense.  Here's what I think you are doing.

You have a known population distribution f.  You now take a vector v
of length n in which x appears c(x) times, with the property that
c(x)/n is approximately f(x).  You regard this as a set of
equiprobable outcomes, and so determines a population with
distribution g satisfying g(x)=c(x)/n.  This population has

\mu=\sum x g(x)=(1/n)\sum x c(x)

\sigma^2=\sum (x-\mu)^2 g(x)=(1/n)\sum (x-\mu)^2 c(x).

So in this case using a denominator of 1/n makes sense.  However, this
is not the sample variance in any normally accepted sense: there is no
sample in sight.  There are n equiprobable outcomes defining a random
variable X with P(X=x)=c(x)/n.

Let me know if this is what you are getting it.

Best wishes,

John





----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to