The problem with *any* hypothesis test for normality (or any other
"gatekeeper test", such as the infamous "F before t")  is that it
answers the wrong question.

        What you *want* to know (or should) is "do these data give me reason to
be confident that the population distribution is close enough to normal
[or whatever] for purpose X?"   

        What these tests tell you is "do these data give me reason to be sure
that the distribution is NOT perfectly normal [or whatever]?"

        Mistaking one of these for the other (and acting on a rejection by not
doing whatever you were planning to) yields all sorts of ironies, for
instance:

        For most purposes, as N -> infinity,  you need to worry less about
normality.  As N -> infinity, the K-S is _more_ likely to tell you to
start worrying.

        For very small N, distributional asumptions are most important.         For
such samples, the K-S will notice nothing.

        A failure to reject the null hypothesis is a "soft" outcome, meaning
"we need more data".  Rejection is the only definite outcome.
Thus, the possible outcomes of a gatekeeper + main test sequence are:

        (1) You do not do the main test.
        (2) You do the main test , but you cannot be sure you should have done
the main test.

        This was probably described best by the old farmer in the story: "When
it don't rain, the roof don't leak; and when it rains, I can't fix her
nohow."

        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to