[R] Linear Model and Missing Data in Predictors

Lorenzo Isella Tue, 15 Mar 2016 08:20:20 -0700

Dear All,
A situation that for sure happens very often: suppose you are in the
following situation


set.seed(1235)
x1 <- seq(30)
x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
x3 <- c(rnorm(17)-2, rep(NA, 13))

y <- exp(seq(1,5, length=30))


mm<-lm(y~x1+x2+x3)

i.e. you try a simple linear regression with multiple regressors
which exhibit some missing values.
This is what happens to me while working with some time series which I
use as regressors and whose missing values are padded with NAs.
lm, as a default, disregard the sets of incomplete observations and
therefore drops quite a lot of data.
Is there any way to circumvent this? I mean, is there a way to somehow
come up with a piecewise linear regression where, whenever possible,
all the 3 regressors are used but we switch to 1 or 2 when there are
missing data?
I say this because it is totally unfeasible to try to figure out the
values of the missing data in my regressors, but at the same time I
cannot restrict my model to the intersection of the non-NA values in
the 3 regressors. If this makes sense, do I have to code it myself or
is there any package which already implemented this?
Any suggestion is appreciated.
Cheers

Lorenzo

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Linear Model and Missing Data in Predictors

Reply via email to