Philosophy of Statistics

Robert J. MacG. Dawson Tue, 23 Apr 2002 07:09:49 -0700

Voltolini wrote:
> 
> Dear friends,
> 
> How to choose between parametric and non parametric tests? Following any
> texbook we can find the idea of using histograms, curtosis, skewness and...
> tests like Shapiro-Wilk to test normality. In the case of homocedasticity,
> we can use one of the best for this task, the Levene test.


        Standard hypothesis testing is here (as in many other
applications) the right answer to the wrong question.

        Tests of normality answer the question

        "is there enough evidence to conclude that the population from
which the data were drawn is not perfectly normally distributed?"

        to which the answer will almost always be "yes", with the
probabilityincreasing as the sample size rises.  The question you
usually *want* toknow the answer to is (phrased negatively so as to have
the same "sense"as the first one)

        "is the the population from which the data were drawn far enough
from being normally distributed that statistics based on the normal
distribution will be a bad approximation?"

        to which the answer is often "no", with probability increasing
with N. (This is because the Central Limit Theorem tells us that as N
gets large the sample means approach a normal distribution...) 
Short-tailed distributions, in particular, are excellently-behaved even
for quite small N.  "Short-medium" skewed distributions are also fairly
well-behaved; distributions with at least one heavy tail are more
problematic and in extreme cases (eg, Cauchy) will *not* settle down. 


For small N, the set of distributions for which the "standard"
techniques fail to work well is large enough that many of them will not
be rejected by any reasonable test of normality.  For large N, the set
of distributions that will be rejected is large enough that it includes
many distributions that are perfectly well-behaved for "standard"
techniques.  In between, you may have both problems coexisting, as the
distributions that get noticed by the normality tests are not always the
ones that cause problems with other tests.

        There is a second important question that one should ask:

        "is the population from which the data were drawn one that is
well-described by the mean?"

        This depends to a large extent on what parametric model, if any,
one can use, and also on the shape of the distribution. For instance,
the mean is appropriate as a descriptive statistic for all symmetric
distributions (unless it fails to exist), and within many one-parameter
families of distributions such as Poisson, exponential, or U[0,A]. It is
probably *not* appropriate to compare the means of (say) a Poisson and a
geometric RV; indeed, it is not clear that it is appropriate to compare
them at all, as they are dimensionally inequivalent!

        That said, it does NOT follow that the sample mean is always the
right test statistic to use for a hypothesis about the mean.  For
instance, if a distribution is symmetric but heavy-tailed, you may want
to use the sample median as an estimator for the mean...just because it
has more efficiency (that is, will in the long run be closer to the true
value). There are also the famous superefficient estimators for uniform
distributions that work with the cutoff point, and get an order of
magnitude more power than the sample mean would!

        Here, the mean does not become more (or less) appropriate with
large N. However, normality tests still reject many distributions for
which the mean is perfectly appropriate, especially as N gets large.

        For practical purposes, I would recommend the use of
box-and-whisker plots. While these, like everything else, tell us little
for very small N, (and must be read carefully for very large N), they
indicate skewness, systemic heavy-tailedness, and outliers, rather
well.  They do require the use of some judgement as they do not create a
(bogus) dichotomy between "reject" and "proceed"; that's why the
statistician gets paid the big bucks while the software package gets
switched off at the end of the session. It's not *meant* to do the
thinking. 

        Similar arguments apply to tests of homoscedasticity. If N is
small, heteroscedasticity is more of a problem and the tests probably
won't detect it. If N is larger, moderate heteroscedasticity can often
be ignored and the preliminary test will be hypersensitive. Either way
you lose.  

        The morals: 

        In these matters, practical significance has nothing to do with
statistical significance.

        For medium sized data sets: learn how large a breach of the
assumptions you can get away with. (If you don't have any other way to
tell, simulate.) 
        
        For large data sets: as above, and expect good things.
         
        For small data sets: accept that you may not be able to get
definitive results, and gather more data when you can. 

        Don't just assume something because it gives you an answer; and
don't think that you are justified in doing so because a test that has
very little power to detect deviations doesn't find any. That would be
like having a smoke detector with no battery in and concluding that
there was no fire because it was not beeping!

        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Philosophy of Statistics

Reply via email to