In The 20 July 2007 issue of Statistics in Medicine, Harel and Zhou have
a very nice paper overviewing MI and software for it [I have to comment
though on the use of the ugly underscore as an assignment operator in
their S-Plus code, and wish they had covered my R/S-Plus routines :-)]
In their paper they outline the approximate Bayesian bootstrap (ABB) in
the predictive mean matching (PMM) setting for obtaining m multiple
imputations thus (quoting from Harel & Zhou):
- Bootstrap a sample from the complete data
- Fit a model predicting the missing variable
- Temporarily fill all missing values and apply the previous stage to
incomplete cases in order to compute predicted means [I assume this
means to initiate the overall process with convenient fill-ins for
imputing variables and for the 2nd time through the large loop one
remembers the imputed values of the imputers from the 1st time, etc.
But this step seems to be out of order with the previous step.]
- Match each incomplete case with m complete cases based on the distance
of the predicted means
- Randomly choose one of the matched complete cases and use this case to
impute all variables of the incomplete case
- Repeat this process m times to get m completed datasets
A number of questions arise:
- I thought that ABB involved sampling with replace from sampling with
replacement of the cases that are complete with respect to the variable
being imputed. I'm not clear on the algorithm they wrote, but it seems
a bit different from that.
- This seems to be different from the MICE (multiple imputation using
chained equations) approach in which imputations for multiple variables
that are missing are done in sequence and not simultaneously.
- ABB does not seem to be a totally prescriptive approach. How do we
know when we are doing it correctly? With regression imputation one
could fit a imputation model on a sample with replacement from a sample
with replacement of non-missing cases, or one could fit the imputation
model on a sample with replacement then take a sample with replacement
of the residuals off this model to assign imputations. With PMM there
are many choices to make.
I would appreciate any thoughts anyone has on these issues.
Thanks
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University