Robert J. MacG. Dawson wrote:
>         Basically, it boils down to this: we should not do inference of any
> sort on small data sets unless we have valid prior knowledge to justify
> our methods.
> 
>         There are those who would omit the word "small" from this; myself, I am
> prepared to use a large data set as evidence of its own approximate
> normality, largely because when the data set is large, "approximate
> normality" may be very approximate indeed, as the Central Limit Theorem
> will take care of almost anything. For large N, the t test is
> essentially nonparametric.
> 
>         I would say that the real normality issue is usually not whether the
> use of the t test is justified, but whether the use of a transformation
> or rank-based method might be advisable, and whether the mean is a
> useful parameter for such a distribution.
> 
>         Normality testing, as usually done, is a bad joke.  It is powerful
> precisely in those cases (large sample) when normality isn't an issue,
> and "rejects" data sets for which the t yoga is perfectly appropriate.
> It lacks power in those cases (small samples) when there is cause for
> concern, and lets the unwary use t-distrbution methods when they
> shoudn't.

Excellent summary. I'd add other common tests of assumptions to this (e.g., 
sphericity).

>         I would suggest using boxplots to spot very skewed or heavy-tailed
> samples; avoiding very small data sets whenever possible; and leaving

I'd also add normal probability plots, descriptive statistics (e.g.,
variances, Epsilon values for sphericity) and residual diagnostics to the
list. Small samples will always remain a problem.

Thom


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to