Hi folks,
I'm writing with a question about how to develop a imputation model when
(a) there are many potential variables to include and (b) the number of
imputations required for the MCMC chain to stabilize is very high (~3000)
when a large number of variables are included in the imputation model. I'll
do my best to describe our situation briefly:

THE STUDY
Data from 48 people were collected at six time points, and include over
2,000 variables. Each of the research questions requires running a multiple
regression in which 2-3 variables assessed at earlier time points predict a
variable assessed at the last time point. All data are available for the
outcome variable, but there are missing data for all of the predictors
(ranging from 5% to 31% missing).

DEVELOPING THE IMPUTATION MODEL
We have tried two basic approaches to developing the imputation model. One
is simply to include in the imputation model all of the variables that will
appear in any of the analyses. This imputation model consists of around 35
variables. The other approach was to select a much larger pool of potential
variables to consider for inclusion in the imputation model. We identified
all variables that we believed would be associated with our main variables
of interest. We then conducted a series of stepwise regressions as a
shortcut to attempt to identify a smaller set of variables that uniquely
predicted each of the main variables for which data were missing. This
smaller set contained 18 variables, which--when added to the main
variables--led to an imputation model of 53 variables.

QUESTION
When we generate imputed data sets with the smaller imputation model, the
chain stabilizes relatively quickly (a little over 100 iterations are
needed). In contrast, over 3000 iterations are needed with the larger
imputation model. Should we use the smaller imputation model, even if it
doesn't include variables that we know are uniquely predictive of variables
for which there are missing data?

Thanks in advance for your thoughts!!
Jon

-- 
***Please note change of email to [email protected]***

Jonathan Mohr
Assistant Professor
Department of Psychology
Biology-Psychology Building
University of Maryland
College Park, MD 20742-4411

Office phone: 301-405-5907
Fax: 301-314-5966
Email: [email protected]

Reply via email to