Mike wrote in message <8ffek1$1q2$[EMAIL PROTECTED]>...
>I would like to obtain a prediction equation using linear regression for
>some data that I have collected.  I have read in some stats books that
>linear regression has 4 assumptions, 2 of them being that 1) data is
>normally distributed and 2) constant variance.  In SAS, I have run
>univariate analysis testing for normality on both my dependent and
>independent variable (n=147). Both variables have distributions that are
>skewed.
>
>For the dependent variable:  skewness=0.69 and Kurtosis=0.25.
>For the independent variable: skewness=0.52 and Kurtosis= -0.47.
>
>The normality test (Shapiro-Wilk Statistic) states that both the dependent
>and independent variables are not normally distributed.
>
>I have also transformed the data (both dependent and independent variables)
>using log, arcsine, and square root transformations.  When I run the
>normality tests on the transformed data, the test shows that even the
>transformed data is not normally distributed.
>
>I realize that I can use nonparametric tests for correlation (I will use
>Spearman), but is there a nonparametric linear regression?  If not, is it
>acceptable to use linear regression analysis on data that is not normally
>distributed as a way to show there is a linear relationship?
>
>thanks in advance..Mike
>


The importance of normality is grossly over-emphasized.
If you have grossly long tails - in the residuals - or if there
are outliers, then you may have to look at more robust forms
of regression than least squares.   Auto-correlations amongst
consecutive observations, or lack of homogeneity  of variance
are often more important than non-normality.

There is absolutely no requirement that the predictors (or
independent variables) should have a normal distribution, in fact
the opposite.   Ideally, the predictors should be from a designed
experiment and hence will not even be random.   Most of them
should be towards the outer bounbaries of the predictor space.
I guess that most designed experiments would give negative
kurtoses if you go through the mechanics of calculating
coefficients of kurtosis.    If the predictors are from a multivariate
normal distribution, then there is far too much clustering in the
centre of the design space.   Quite often, some of the predictors
are discrete - perhaps (0,1) variables, and hence cannot have
normal distributions.

--
Alan Miller, Retired Scientist (Statistician)
CSIRO Mathematical & Information Sciences
Alan.Miller -at- vic.cmis.csiro.au
http://www.ozemail.com.au/~milleraj
http://users.bigpond.net.au/amiller/





===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to