Yes, I was wrong about the need for normality of the residuals. I somehow
had the idea that estimates of the precision of estimates come directly
from normality of the individual errors, but it just ain't so. Estimating
the confidence limits for the mean of a sample is the way to see how the
central limit theorem smooths out a nasty distribution of
residuals. According to Bill Ware and Paul Swank, the distribution of the
variance of the mean takes a bigger sample to settle down than the
distribution of the mean, but I can't really see how that matters, unless
you can make it the basis of the kind of test for non-normality I am
looking for.
So when I plot residuals (Y) against predicteds (X), the scatter in the Y
direction can look quite discrete (as in residuals from a Likert scale with
only a few levels) and skewed (as in responses piled up at either end of
the Likert scale, or as in Robert Dawson's example of Poisson distributions
with small means). All that matters is that I have enough observations
for the central limit theorem to smooth out the "graininess". It's really
cool to learn that I may not have to use logistic regression for Likert
scales, but how do I know whether I have enough observations? Someone
suggested some function of the sample size and the third and/or fourth
moments. Anyone know of any simulations done on anything like that?
While we're on the subject of residuals vs predicteds... We are supposed
to check for substantial curvature in the plot (which would indicate the
model needs refining) and substantial non-uniformity in scatter for
different predicted values (heteroscedasticity, which biases the estimates
towards the observations with more scatter and also stuffs up the
confidence limits). The rule for these two problems seems to be: if you
can see it on the plot, you should do something about it. Anyone got
anything more quantitative than that? I guess you have to make your own
decision about curvature, based on what you know from clinical experience
about what effects are substantial. (You could use Cohen's scale of
magnitudes as a default.) But what about the non-uniformity of
scatter? How big does a difference in variance between groups or between
either ends of the residuals vs predicteds have to be before the associated
bias is a concern?
Will
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================