Will Hopkins wrote:
>
> A supplementary question: Some time ago I saw someone defaulting to
> a non-parametric analysis when the sample size was small. My first
> reaction was: "that's precisely when you don't want to use
> non-parametrics, because they have less power than parametrics with
> small samples. (With large sample sizes parametrics and
> non-parametrics have the same power, for normally distributed
> residuals.)" Ah yes, but now I see that defaulting to
> non-parametrics for small sample sizes is the *safe* way to go,
> because with small samples you can't be sure the residuals in the
> population are normal, even when the residuals in the sample look
> perfectly normal.
Basically, it boils down to this: we should not do inference of any
sort on small data sets unless we have valid prior knowledge to justify
our methods.
There are those who would omit the word "small" from this; myself, I am
prepared to use a large data set as evidence of its own approximate
normality, largely because when the data set is large, "approximate
normality" may be very approximate indeed, as the Central Limit Theorem
will take care of almost anything. For large N, the t test is
essentially nonparametric.
I would say that the real normality issue is usually not whether the
use of the t test is justified, but whether the use of a transformation
or rank-based method might be advisable, and whether the mean is a
useful parameter for such a distribution.
Normality testing, as usually done, is a bad joke. It is powerful
precisely in those cases (large sample) when normality isn't an issue,
and "rejects" data sets for which the t yoga is perfectly appropriate.
It lacks power in those cases (small samples) when there is cause for
concern, and lets the unwary use t-distrbution methods when they
shoudn't.
I would suggest using boxplots to spot very skewed or heavy-tailed
samples; avoiding very small data sets whenever possible; and leaving
normality testing to those who:
(1) can explain coherently why they need to know if a population's
distribution is not perfectly normal;
(2) can affirm that they understand the Central Limit Theorem and its
application to the Student's t distribution; and
(3) can affirm that this is one of the rare cases where this does not
render normality testing pointless.
I'm sure these people are out there somewhere. I don't think I've ever
met one.
-Robert
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================