Re: normality and regression analysis

Herman Rubin Fri, 12 May 2000 08:24:30 -0700
In article <8ffek1$1q2$[EMAIL PROTECTED]>, Mike <[EMAIL PROTECTED]> wrote:
>I would like to obtain a prediction equation using linear regression for
>some data that I have collected.  I have read in some stats books that
>linear regression has 4 assumptions, 2 of them being that 1) data is
>normally distributed and 2) constant variance.  In SAS, I have run
>univariate analysis testing for normality on both my dependent and
>independent variable (n=147). Both variables have distributions that are
>skewed.

There is no reason to assume that the data are normal.  For
linear regression to be exactly the MLE procedure, it is the
residuals from the true regression which need to have certain
properties.  In well designed experiments, the independent
variables are never normal.  Rarely will the dependent variables
be close to normal, either.

The key properties for the residuals are lack of correlation
with the independent variables, independence, and homoscedasticity.
Normality is well down the list.  Remember that this is for
the residuals, not the data.  Linearity of the model is a
consequence of these.
 
>For the dependent variable:  skewness=0.69 and Kurtosis=0.25.
>For the independent variable: skewness=0.52 and Kurtosis= -0.47.

>The normality test (Shapiro-Wilk Statistic) states that both the dependent
>and independent variables are not normally distributed.

>I have also transformed the data (both dependent and independent variables)
>using log, arcsine, and square root transformations.  When I run the
>normality tests on the transformed data, the test shows that even the
>transformed data is not normally distributed.

If you have a linear model, transforming it will generally
make it non-linear.  Linearity in the relationship remains
the most important property; normality is one of the least.

>I realize that I can use nonparametric tests for correlation (I will use
>Spearman), but is there a nonparametric linear regression?  If not, is it
>acceptable to use linear regression analysis on data that is not normally
>distributed as a way to show there is a linear relationship?

Consider your probability model; what can you assume is a
linear function of what with additive errors.  YOU, the
user, must answer that.


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED]         Phone: (765)494-6054   FAX: (765)494-0558


===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================
Re: normality and regression analysis

Reply via email to