Dear David-
Although the multiple imputation estimator for coefficients is simply
the average of coefficients across imputed data sets, standard errors
cannot be treated that way. This is so because beyond the
variability associated with the coefficient estimates in each
individual imputed data set (the "within-imputation" component of
variance), additional variability arises when estimating *across* the
different data sets--the "between-imputation" component of variance.
The formula for calculating standard errors is:
se-MI = u-bar-M + ((M+1)/M)*b-M),
where:
se-MI is the standard error of a multiply imputed coefficient;
M is the number of imputed data sets;
u-bar-M is mean variance across imputed data sets (i.e., 1/M * sigma
s^^2-i, where "sigma" is a summation, "s" is the standard error, "^^"
means squared, and "i" indexes imputed data sets);
and b-M is the variance of coefficient estimates around their mean,
with an adjustment factor that reduces b-M in proportion to the
number of multiply imputed data sets (i.e., b-M = ((1/M-1)*sigma(e-I
- e-bar-MI)^^2), where e-I is the coefficient estimate for imputed
data set i and e-bar-MI is the mean of coefficients across imputed data sets).
Sorry for the awkward notation. You can get a much prettier version
of these equations in pp. 108-109 of:
Raghunathan T.E. (2004). "What do we do with missing data? Some
options for analysis of incomplete data", Annual Review of Public
Health, 25, 99-117.
I see that others have already suggested implementations in software
packages. That great news: I've always done it manually in a
spreadsheet program!
Hope this helps,
David
At 09:38 AM 11/11/2009, David Judkins wrote:
Well, I think this is the first question to the group since list
ownership changed. I wonder how many people are signed up now? It
hasn't been a very active list for a long time. Anyway, here is my question.
I have a dataset with multiple imputations. It is from a five-arm
GRT. One arm is a control and the other four are active. I want to
test for variation in mean responses across the four active
arms. Proc Mixed will give me a test statistic based on each
multiple imputation. But how do I combine these?
One of colleagues found something in the HLM manual that would
suggest that the replicates of test statistics other than
t-statistics are averaged with no attention paid to the variability
among them. Sound accurate about HLM? Is that the best we can do?
David Judkins
Senior Statistician
Westat
1650 Research Boulevard
Rockville, MD 20850
(301) 315-5970
[email protected]
==============================
David Crow
Associate Director
Survey Research Center
University of California, Riverside
900 University Avenue
1419 Spieth Hall
Riverside, CA 92521
Tel.: (951) 827-4028
Fax: (951) 827-4035
Web: survey.ucr.edu
==============================
"It is the mark of an educated mind to rest satisfied with the degree
of precision which the nature of the subject admits and not to seek
exactness where only an approximation is possible." Aristotle