Bob Hayden wrote:
> In addition to the approximation involved in using the CLT, most
> (possibly all) practical situations require that you estimate the
> population standard deviation with the sample standard deviation in
> calculating a standard error for use in constructing a confidence
> interval or doing a hypothesis test.  This introduces additional
> error.  Again, the error is small for large samples.  For smaller
> samples, it can be fairly large.  The usual way around that problem is
> to use the t distribution, which you can think of as a modified normal
> distribution -- the modifications being those needed to exactly offset
> this source of error.  The trouble is, in order to calculate those
> corrections, we need to know the shape of the population
> distribution.  The corrections incorporated into the t-distribution
> are those appropriate for a normal distribution.  So, when we use the
> t-distribution, we need to have the population close to normally
> distributed in order for the usual test statistic to have a
> t(not z)-distribution.

Yes.

A lot of people miss the fact that the t-statistic has both
a numerator and denominator. The numerator will go to the
normal when the CLT holds (but how quickly depends on the
distribution). 

However, the denominator needs to:
1) go to a multiple of the square root of (a chi-squared r.v. / d.f.)
2) be independent of the numerator

to give you a t-distribution. In practice these only need 
to hold closely enough to yield something close to a 
t-distribution at the sample size you're interested in.

This isn't all - even if you get this, you are only getting
robustness to the /significance level/. You also want decent 
power-robustness. That may be a problem for the t in some
circumstances; there's not much point in keeping close to
the right Type I error rate if you take no account of the
Type II error rate.

There are times when a test of location for which the 
normality assumption is not required may be less of a risk;
the tiny amount of power (the relative efficiencies are very 
close to 1) you give up when the data are exactly normal is
a tiny price to pay to maintain good efficiency when you
move away from the normal. This may be a more-robust version
of the t-test, it may be a randomization/permutation test
or it may be a rank-based equivalent.

Glen


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to