You've only given a very general description of your problem. You haven't
even told us what your dependent variable is. So all of the advice will be
quite general.

When your data are autocorrelated and you ignore the autocorrelation, any
tests and any confidence intervals that you produce are invalid.

How invalid are depends on a lot of things, but you could grossly overstate
your precision if there is a strong negative autocorrelation or you could
grossly understate your precision if there is a strong positive
autocorrelation.

Furthermore, autocorrelation will often induce artefactual patterns in the
data, making some of the estimates themselves invalid. For example, a strong
positive autocorrelation will often create an artefactual drift upward and
downward that might be confused with a seasonal trend.

People tend to look at violation of assumptions as a bad thing, but you
could also look at it as an opportunity. In regression, you fit your model
and hope that the residuals look like white noise. When they do look like
white noise, you know that you have extracted the maximum amount of
information from your data, and that there are no systematic trends,
patterns, or correlations among the part of the data that the model does not
predict.

When the residuals do not look like white noise--because there is a
non-linear trend, because there is heterogeneity in the variance, or because
there is autocorrelation, then there is information in the residuals that
you can extract to produce an even better model.

So why wouldn't you want to improve your regression model? Well, maybe
because you need to keep things simple, or maybe you don't have access to
the right software, or maybe you are under serious deadline pressures.

George Box has a saying that is often quoted on this list: "All models are
wrong, but some models are useful." Is your current model useful? Well, all
the tests and confidence intervals are invalid, but if the autocorrelation
is positive then at least you are being conservative. I'd also worry about
the artefactual trends that a positive autocorrelation can produce.

Also look at it this way. It might turn out that a more complex model is
overkill, but you will never know for sure unless you take the time to fit
the more complex model. And if you encounter this type of data often in your
work, it would make sense to learn and try some methods that have been shown
to work well with autocorrelation.

Finally, you allude to the fact that you have thousands of independent
variables. If this is true, then you may have other problems that are far
worse than autocorrelation. The traditional regression methods that work
well for a few dozen independent variables will fail miserably when you have
thousands of independent variables. I'm working a lot now with microarray
data, where scientists can measure the expression of thousands of genes on a
single slide or chip. With these experiments, statisticians have had to
develop entirely new methods because the more traditional methods (like
least squares regression) fall apart.

Best of luck!

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
The STATS web page has moved to
http://www.childrens-mercy.org/stats.

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to