The problem with *any* hypothesis test for normality (or any other
"gatekeeper test", such as the infamous "F before t") is that it
answers the wrong question.
What you *want* to know (or should) is "do these data give me reason to
be confident that the population distribution is close enough to normal
[or whatever] for purpose X?"
What these tests tell you is "do these data give me reason to be sure
that the distribution is NOT perfectly normal [or whatever]?"
Mistaking one of these for the other (and acting on a rejection by not
doing whatever you were planning to) yields all sorts of ironies, for
instance:
For most purposes, as N -> infinity, you need to worry less about
normality. As N -> infinity, the K-S is _more_ likely to tell you to
start worrying.
For very small N, distributional asumptions are most important. For
such samples, the K-S will notice nothing.
A failure to reject the null hypothesis is a "soft" outcome, meaning
"we need more data". Rejection is the only definite outcome.
Thus, the possible outcomes of a gatekeeper + main test sequence are:
(1) You do not do the main test.
(2) You do the main test , but you cannot be sure you should have done
the main test.
This was probably described best by the old farmer in the story: "When
it don't rain, the roof don't leak; and when it rains, I can't fix her
nohow."
-Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================