Here's a response to the two people who have replied to the list 
about my query.  (Thanks heaps for your input.  This list is 
wonderful.  If it ever loses its institutional support, and noone 
else wants to pick it up, I will.  I'd run it with listproc, and we 
would have moderators to filter out the spam.)

At 6:32 PM -0500 15/1/01, Bob Wheeler wrote:
>In practice the observed residuals are highly
>correlated and, if the design is a good one,
>fluctuate in a small space with few degrees of
>freedom. Applying any test for non-normality to
>such observed residuals is fairly futile.

The test for normality is a test of distribution of magnitudes, not 
independence of the errors or of the residuals.  Yes, the residuals 
are correlated, but that may have no bearing on the normality of 
their distribution.  If you fit a straight line to three points drawn 
from a population with a correlation between two normally distributed 
variables, are the residuals normally distributed?  I guess they must 
be, or the regression analysis wouldn't give correct confidence 
limits.  (BTW, "highly" correlated is surely not correct for analyses 
with a few estimated parameters and many degrees of freedom.  And I'm 
not sure why a good design would be one in which the residuals had 
few degrees of freedom.  The bigger the sample size, the better, 
except when you end up with more precision for your effects than you 
need.)

It is probably futile to TEST for non-normality whatever the sample 
size, because, as Robert Dawson pointed out, large samples usually 
test positive for non-normality even when the residuals look 
reasonably normal, whereas small samples usually test negative even 
when the residuals are quite non-normal. But it is not futile to 
ESTIMATE non-normality for large sample sizes, if you know the 
magnitude of non-normality that starts to screw up your estimates and 
their confidence limits.  And it may not be futile to estimate 
non-normality for small sample sizes either, depending on how you 
think the residuals should be distributed.  For example, you may have 
good reasons for doing a log transformation, so check the residuals 
after log transformation.  If they have a higher normality score than 
the residuals from the raw variable, fine, even though both sets of 
residuals aren't statistically significantly different from normal. 
What will matter is HOW non-normal the residuals are.  My question 
about the magnitude of deviation from non-normality still stands.

At 10:41 AM -0400 16/1/01, Robert J. MacG. Dawson wrote:
>       There are those who would omit the word "small" from this; myself, I am
>prepared to use a large data set as evidence of its own approximate
>normality, largely because when the data set is large, "approximate
>normality" may be very approximate indeed, as the Central Limit Theorem
>will take care of almost anything. For large N, the t test is
>essentially nonparametric.

Are you suggesting that you don't need transformation for large 
sample sizes?  I think you have a popular misconception about the 
central limit theorem.  Sure, the mean of a large sample is normally 
distributed, whatever the parent distribution, but that's not the 
issue.  It's the residuals that have to be normal. If your variable 
is non-normally distributed, it doesn't matter how big your sample 
size is: the precision of your estimates based on the raw variable 
will never be correct.  Or to put it another way, you will find a 
substantial difference between the estimates from the untransformed 
vs the transformed data.  Which estimates do you use?  The ones from 
the analysis that has residuals closer to normality.

>       I would suggest using boxplots to spot very skewed or heavy-tailed
>samples;

You see, you are using a qualitative estimate of non-normality!  I 
want a rule based on a quantitative estimate.

Will



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to