Raul:

Obviously we are at cross-purposes.  From your last post, I think I
have an inkling of what the problem is.  The following represents my
understanding of what we are talking about.

I assume I have a population with unknown distribution f.  I want to
learn something about this population, and in particular estimate its
mean and variance.

To do this, I take a finite random sample X1,...,Xn.  These are
independent random variables each having distribution f.  I then
define the sample mean and the sample variance.  These are statistics,
which means they are functions of X1,...,Xn, and so they are random
variables whose distribution depends on f.  I show that the expected
value of the sample mean is the population mean, and the expected
value of the sample variance is the population variance, which I
describe by saying that these are unbiased estimators of the
appropriate parameter.  This calculation ultimately eliminates f,
which is good because it is unknown.

Now I take an actual sample x1,...,xn and evaluate the statistic on
the sample.  This gives me numbers which estimate the population mean
and variance.  If I wanted to know how good these estimates are, I
would use confidence intervals.  This requires some knowledge of the
distribution of the statistics.

For large samples, or for samples from a normal distribution, I know
the sample mean is approximately normally distributed with mean equal
to the population mean and variance \sigma^2/n, where \sigma^2 is the
population variance.  If I know \sigma, I am done.  If I have to
estimate \sigma from the sample, the sample mean has the
t-distribution equivalent t-distribution.

For the variance, constructing a confidence interval is more
difficult, and requires more knowledge of the population distribution
f.  For example, if f is normal, then some multiple of the sample
variance has the chi-squared distribution.

The above summarizes the estimation process I am doing.  I now believe
you are doing something different, which explains why we are having
trouble communicating.

You have a population with known distribution f, and a random variable
X whose distribution is f.  You then construct a sample space of
equiprobable outcomes and define a random variable Y on this whose
distribution is g, with g is an approximation to f.  Then then mean
and variance of Y, are expected to approximate the mean and variance
of X.

For you, the sample mean is E(Y), a number approximating E(X), and the
sample variance is \sigma^2(Y), a number approximating \sigma^2(X).
This explains some of the confusion we have had, where I have been
insisting that the sample mean is a random variable, and you have been
insisting it is a number.

The term "sample space" is misleading.  For the finite distributions
we are discussing, the sample space is just the set of outcomes.
Applying a random variable to this is a sample of size 1: if S is the
sample space, X is a function X:S->R with distribution f.

When I talk about a sample of size n, I am talking about a function

X1 x X2 x X3 x...Xn : S x S x ... x S-> R x R x ... x R,

with distribution f x f x ... x f.  This random variable ranges over
all possible samples of size n.

Please let me know if this accurately represents what you are doing.
If so, I would suggest you proceed more straightforwardly.  Since you
know the population distribution f, you can estimate the population
parameters without worrying about sampling or statistics.  For example
for the population mean, you want to find an estimate for

\sum x f(x) or \int x f(x) dx,

depending on whether the population is discrete or continuous.  This
can be done using techniques from numerical analysis rather than
statistics.  You are constructing an approximation g to f, and then
figuring out the above with g substituted for f (I think).

Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to