David Judkins wrote:
I am not aware of the capabilities of IVEware, but the general question
of person-level mean squared prediction error is a function of both the
covariates and the imputation procedure. As Dr. Rubin has pointed out,
minimizing person-level MSPE is not typically a primary goal in the
analysis of surveys and experiments although it might be important an
activity like fraud detection. Nonetheless, reduced person-level MSPE
should also translate into both lower variances on estimated population
and superpopulation marginal parameters and reduced bias on regression
coefficients. So you want to use as rich a set of covariates in the
imputation as are available to you and to use the model-based
predictions in your imputation to at least some extent. Unfortunately,
the stronger the usage you make, the more difficult it becomes to
estimate the post-imputation variance. For example, a predictive-mean
matching approach to imputation defeats multiple imputation as a
variance-estimation technique. For normally distributed outcomes,
David - It's not clear to me why PMM would invalidate the using Rubin
variance estimator for regression coefficient variances. But maybe you
are saying that PMM doesn't work if you are primarily interested in
estimating a variance parameter (what kind?). -Frank Harrell
really good methods that both utilize covariate information and allow
post-imputation variance estimation are pretty much Bayesian and involve
Gibbs sampling to fit complex models and make reasonable posterior
draws. (See Schafer's book.) Even they do not cope well with the
natural heaping in income where people round to the nearest thousand
dollars or even worse. I have some papers on how to impute non-normal
outcomes using covariates that are subject to missing values themselves,
but I have not yet been able to develop and validate good
post-imputation variance estimators to go with them.
Your person-level MSPE seems so large that I suspect your software is
not using any covariates. While that makes post-imputation variance
estimation easy, it seems like you could do better.
The preservation of the marginal first and second order moments of
income seem to support the idea that you are not using any covariates.
The robustness of the model coefficients is harder to reconcile. I
think this can only happen with a simple imputation procedure if the
missing data rate is negligible or if the model isn't very good to begin
with. If substantial numbers of subjects were being thrown back and
forth between $3,000 and $100,000 per year, the coefficients in good
models would certainly be attenuated. Maybe you just don't have any
variables that are strongly related to income?
David Judkins
Senior Statistician
Westat
1650 Research Boulevard
Rockville, MD 20850
(301) 315-5970
[EMAIL PROTECTED]
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul T.
Shattuck
Sent: Wednesday, March 29, 2006 11:43 AM
To: impute@lists.utsouthwestern.edu
Subject: [Impute] range of imputed values for income
Hello,
I am using IVEware for multiple imputation for the first time on a large
national health survey. One of the variables imputed is income and I'm
finding that imputed values can vary dramatically within-subjects across
multiply imputed datasets. For instance, in some cases Person A might
have an imputed income of $3,000 in one imputation, and then $$100,000
in another imputation. This within-person variability far exceeds what
I'm seeing with other variables in the survey. The distributions,
means, and standard deviations of the imputed vs. non-imputed values are
comparable. And multivariate regression results using the multiply
imputed datasets and the original dataset with missing values are
reasonably robust, with the same substantive conclusions and very close
coefficient estimates. So, I'm wondering if this degree of
within-subject variability across imputations is something to worry
about, and potentially an indicator of a mis-specified imputation
model....or whether this kind of within-subject variability across
imputed datasets is typical.
Thanks,
Paul Shattuck
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute