Yes, I was wrong about the need for normality of the residuals.  I somehow 
had the idea that estimates of the precision of estimates come directly 
from normality of the individual errors, but it just ain't so.  Estimating 
the confidence limits for the mean of a sample is the way to see how the 
central limit theorem smooths out a nasty distribution of 
residuals.  According to Bill Ware and Paul Swank, the distribution of the 
variance of the mean takes a bigger sample to settle down than the 
distribution of the mean, but I can't really see how that matters, unless 
you can make it the basis of the kind of test for non-normality I am 
looking for.

So when I plot residuals (Y) against predicteds (X), the scatter in the Y 
direction can look quite discrete (as in residuals from a Likert scale with 
only a few levels) and skewed (as in responses piled up at either end of 
the Likert scale, or as in Robert Dawson's example of Poisson distributions 
with small means).   All that matters is that I have enough observations 
for the central limit theorem to smooth out the "graininess".  It's really 
cool to learn that I may not have to use logistic regression for Likert 
scales, but how do I know whether I have enough observations?  Someone 
suggested some function of the sample size and the third and/or fourth 
moments.  Anyone know of any simulations done on anything like that?

While we're on the subject of residuals vs predicteds...  We are supposed 
to check for substantial curvature in the plot (which would indicate the 
model needs refining) and substantial non-uniformity in scatter for 
different predicted values (heteroscedasticity, which biases the estimates 
towards the observations with more scatter and also stuffs up the 
confidence limits).  The rule for these two problems seems to be: if you 
can see it on the plot, you should do something about it.  Anyone got 
anything more quantitative than that?  I guess you have to make your own 
decision about curvature, based on what you know from clinical experience 
about what effects are substantial.  (You could use Cohen's scale of 
magnitudes as a default.)  But what about the non-uniformity of 
scatter?  How big does a difference in variance between groups or between 
either ends of the residuals vs predicteds have to be before the associated 
bias is a concern?

Will



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to