Bob Hayden wrote:
> In addition to the approximation involved in using the CLT, most
> (possibly all) practical situations require that you estimate the
> population standard deviation with the sample standard deviation in
> calculating a standard error for use in constructing a confidence
> interval or doing a hypothesis test. This introduces additional
> error. Again, the error is small for large samples. For smaller
> samples, it can be fairly large. The usual way around that problem is
> to use the t distribution, which you can think of as a modified normal
> distribution -- the modifications being those needed to exactly offset
> this source of error. The trouble is, in order to calculate those
> corrections, we need to know the shape of the population
> distribution. The corrections incorporated into the t-distribution
> are those appropriate for a normal distribution. So, when we use the
> t-distribution, we need to have the population close to normally
> distributed in order for the usual test statistic to have a
> t(not z)-distribution.
Yes.
A lot of people miss the fact that the t-statistic has both
a numerator and denominator. The numerator will go to the
normal when the CLT holds (but how quickly depends on the
distribution).
However, the denominator needs to:
1) go to a multiple of the square root of (a chi-squared r.v. / d.f.)
2) be independent of the numerator
to give you a t-distribution. In practice these only need
to hold closely enough to yield something close to a
t-distribution at the sample size you're interested in.
This isn't all - even if you get this, you are only getting
robustness to the /significance level/. You also want decent
power-robustness. That may be a problem for the t in some
circumstances; there's not much point in keeping close to
the right Type I error rate if you take no account of the
Type II error rate.
There are times when a test of location for which the
normality assumption is not required may be less of a risk;
the tiny amount of power (the relative efficiencies are very
close to 1) you give up when the data are exactly normal is
a tiny price to pay to maintain good efficiency when you
move away from the normal. This may be a more-robust version
of the t-test, it may be a randomization/permutation test
or it may be a rank-based equivalent.
Glen
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================