In article <8fhfuf$[EMAIL PROTECTED]>,
Steve Gregorich <[EMAIL PROTECTED]> wrote:
>Mike,
>As a demonsrtation to myself, I once fit OLS regression
>models to data with (1) a non-uniformly distributed
>binary outcome and (2) a continuous outcome with a
>U-shaped distribution. I then used the same models to
>estimate the parameter standard errors using a naive
>bootstrap. The distribution of the bootstrap parameter
>estimates in both cases was normal (judging from the
>Q-Q plots and standard tests of normality). Normality
>of parameter estimates isn't everything--so I am not
>suggesting that you use OLS regression indiscriminantly.
>But some people apparently believe there is no way for
>parameter estimates to be normally distributed when the
>data are not. That simply is not the case.
It is standard asymptotic theory that the regression
coefficients are asymptotically normal with the calculated
variance (covariance matrix in multiple regression) if the
true residuals (error terms) are merely uncorrelated with
the predictor variables, and the predictor variables have
reasonable variances. One can even get by with slightly
less. One does lose the precision of the p-values without
normality, but this is very definitiely unimportant, and
the bootstrap does a first-order correction.
These are the real robustness considerations, not the
cute ones by the peddlers of methods which rely on much
stronger assumptions. Lack of correlation between
residuals and predictors is much less of an assumption
than symmetry, etc. It is also preserved under such
operations as aggregation of dependent variables.
>BTW, do you really have a book stating that the data
>need be normally distributed in order to satisfy the
>assumptions of OLS regression? I wouldn't be happy
>with that book.
I will make a much stronger statement; those who assume
that normality SHOULD hold in the model of "real" data
rarely understand probability, which is at the foundation.
It may be true that normality is approximately true, but
adjusting the data to try to make it exact is still likely
to mess up those relations which hold.
Least squares was heavily used in the 19th century in
physics, astronomy, and surveying, even non-linear least
squares. The data were not adjusted to normality, as
this would have destroyed the model.
>In article <8ffek1$1q2$[EMAIL PROTECTED]>, [EMAIL PROTECTED] says...
>>I would like to obtain a prediction equation using linear regression for
>>some data that I have collected. I have read in some stats books that
>>linear regression has 4 assumptions, 2 of them being that 1) data is
>>normally distributed and 2) constant variance. In SAS, I have run
>>univariate analysis testing for normality on both my dependent and
>>independent variable (n=147). Both variables have distributions that are
>>skewed.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================