Also consider the redun function in the Hmisc package, which does not
use the response variable but uses flexible nonlinear additive models to
predict each predictor variable from all the others, using a stepwise
procedure in a formal redundancy analysis.
Frank
Ben Bolker wrote:
Peter Flom <peterf <at> brainscope.com> writes:
Robin Williams wrote
<<<<
Is there any facility in R to perform a stepwise process on a model,
which will remove any highly-correlated explanatory variables? I am told
there is in SPSS. I have a large number of variables (some correlated),
which I would like to just chuck in to a model and perform stepwise and
see what comes out the other end, to give me an idea perhaps as to which
variables I should focus on.
Thanks for any help / suggestions.
Stepwise is a bad method of selecting variables. Far better methods are LASSO
and LAR (least angle
regression), available in the LARS package and the LASSO2 package.
However, while both these methods are good, neither is a substitute for
substantive knowledge.
Also, the key thing is not so much whether variables are correlated, but
whether they are co-linear, which
is different. If you have a great many variables, then you can have a high
degree of colinearity even with no
high pairwise correlations. I've not done this in R, but
RSiteSearch("collinearity", restrict = 'functions') yields 34 hits.
HTH
Peter
Another suggestion would be to do PCA on the predictor variables.
And to read Frank Harrell's book on _Regression modeling strategies_.
cheers
Ben Bolker
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.