I am thinking of implementing an S function for multiple
imputation based on the following strategy.
1. Use a type of generalized additive model based on nonparametric
smoothers to predict a sometimes-missing covariable on the basis of
other covariables and the response variable. One promising model
is Tibshirani's AVAS (additivity and variance stabilization)
method, which seeks monotonic transformations in the variable
being predicted so that the transformed variable has constant
variance across levels of predictors. This will result in
higher R^2 as well as allowing a constant-width window (epsilon,
below) to be used for matching, versus predicting a highly
skewed variable on its original scale, for example.
2. Use this model to obtain predicted transformed values of the
target sometimes-missing variable. These transformed values
are usually scaled to have mean zero and variance 1.
3. Use predictive mean matching: For each subject having a missing
value of the target variable, compute her predicted mean
transformed value from the semiparametric model and call it
u. Find all subjects having non-missing values such that their
predicted value is within epsilon of u. Sample m of those
subjects actual values, with replacement, and use these as
the m multiple imputations for the target variable for the
subject in question.
Is this a reasonable multiple imputation strategy?
How does one choose epsilon?
Thanks in advance for any thoughts on this proposal.
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat