David Judkins wrote:
I am not aware of the capabilities of IVEware, but the general question
of person-level mean squared prediction error is a function of both the
covariates and the imputation procedure.  As Dr. Rubin has pointed out,
minimizing person-level MSPE is not typically a primary goal in the
analysis of surveys and experiments although it might be important an
activity like fraud detection.  Nonetheless, reduced person-level MSPE
should also translate into both lower variances on estimated population
and superpopulation marginal parameters and reduced bias on regression
coefficients.  So you want to use as rich a set of covariates in the
imputation as are available to you and to use the model-based
predictions in your imputation to at least some extent.  Unfortunately,
the stronger the usage you make, the more difficult it becomes to
estimate the post-imputation variance.  For example, a predictive-mean
matching approach to imputation defeats multiple imputation as a
variance-estimation technique.  For normally distributed outcomes,

David - It's not clear to me why PMM would invalidate the using Rubin variance estimator for regression coefficient variances. But maybe you are saying that PMM doesn't work if you are primarily interested in estimating a variance parameter (what kind?). -Frank Harrell

really good methods that both utilize covariate information and allow
post-imputation variance estimation are pretty much Bayesian and involve
Gibbs sampling to fit complex models and make reasonable posterior
draws.  (See Schafer's book.) Even they do not cope well with the
natural heaping in income where people round to the nearest thousand
dollars or even worse. I have some papers on how to impute non-normal
outcomes using covariates that are subject to missing values themselves,
but I have not yet been able to develop and validate good
post-imputation variance estimators to go with them.
Your person-level MSPE seems so large that I suspect your software is
not using any covariates.  While that makes post-imputation variance
estimation easy, it seems like you could do better.
The preservation of the marginal first and second order moments of
income seem to support the idea that you are not using any covariates.
The robustness of the model coefficients is harder to reconcile.  I
think this can only happen with a simple imputation procedure if the
missing data rate is negligible or if the model isn't very good to begin
with.  If substantial numbers of subjects were being thrown back and
forth between $3,000 and $100,000 per year, the coefficients in good
models would certainly be attenuated. Maybe you just don't have any
variables that are strongly related to income?

David Judkins Senior Statistician Westat 1650 Research Boulevard Rockville, MD 20850 (301) 315-5970 [EMAIL PROTECTED]

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul T.
Shattuck
Sent: Wednesday, March 29, 2006 11:43 AM
To: impute@lists.utsouthwestern.edu
Subject: [Impute] range of imputed values for income

Hello,

I am using IVEware for multiple imputation for the first time on a large

national health survey. One of the variables imputed is income and I'm finding that imputed values can vary dramatically within-subjects across

multiply imputed datasets. For instance, in some cases Person A might have an imputed income of $3,000 in one imputation, and then $$100,000 in another imputation. This within-person variability far exceeds what I'm seeing with other variables in the survey. The distributions, means, and standard deviations of the imputed vs. non-imputed values are

comparable. And multivariate regression results using the multiply imputed datasets and the original dataset with missing values are reasonably robust, with the same substantive conclusions and very close coefficient estimates. So, I'm wondering if this degree of within-subject variability across imputations is something to worry about, and potentially an indicator of a mis-specified imputation model....or whether this kind of within-subject variability across imputed datasets is typical.

Thanks,

Paul Shattuck



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute

Reply via email to