I posted this a few weeks ago and didn't get any responses, so I'm trying
again.


I have easy to use R and S-Plus functions for bootstrap model validation
and for semiparametric multiple imputation using generalized additive
models and predictive mean matching.  The problem is that we frequently
need to validate models where multiple imputation has been done, and the
software does not currently allow this.  

The bootstrap model validation procedure in a nutshell is as follows. 
Compute a measure of apparent predictive accuracy of the final model you
wish to publish.  Then draw say 100 samples with replacement from the
original data matrix.  For each sample re-build the model, using any
stepwise variable selection and other algorithms that were used to develop
the final model.  Compute the apparent accuracy of this model in the
bootstrap sample.  Apply the same parameter estimates just used to compute
the accuracy of the current model in the original sample.  Compute the
drop-off in accuracy when moving from the bootstrap sample to the original
sample.  Average this drop-off over the 100 bootstrap samples.  This is
Efron's "optimism" estimate.  Subject the optimism from the original
apparent accuracy index to get a bias- or overfitting-corrected estimate
that estimates the accuracy the model would likely achieve in a new sample
of similar subjects.

One easy-to-program way to validate a model in which multiple imputation
has been used in model fitting is to draw 100 imputed values and to bring
one in for each of the 100 bootstrap repetitions.  But then each bootstrap
repetition would not average out the between-imputation variation in
parameter estimates, so it seems to me that this will result in more
variability than is appropriate.  Can anyone think of a reasonable
algorithm that avoids having to do 5 or 10 multiple imputations fresh for
each bootstrap sample, i.e., avoids having to start from scatch with each
bootstrap sample to develop the imputation models?

Thanks,

Frank 

---
Frank E Harrell Jr    Professor and Chair            School of Medicine
                      Department of Biostatistics    Vanderbilt University

Reply via email to