Dear David-

Although the multiple imputation estimator for coefficients is simply the average of coefficients across imputed data sets, standard errors cannot be treated that way. This is so because beyond the variability associated with the coefficient estimates in each individual imputed data set (the "within-imputation" component of variance), additional variability arises when estimating *across* the different data sets--the "between-imputation" component of variance.


The formula for calculating standard errors is:


se-MI = u-bar-M + ((M+1)/M)*b-M),


where:


se-MI is the standard error of a multiply imputed coefficient;


M is the number of imputed data sets;


u-bar-M is mean variance across imputed data sets (i.e., 1/M * sigma s^^2-i, where "sigma" is a summation, "s" is the standard error, "^^" means squared, and "i" indexes imputed data sets);


and b-M is the variance of coefficient estimates around their mean, with an adjustment factor that reduces b-M in proportion to the number of multiply imputed data sets (i.e., b-M = ((1/M-1)*sigma(e-I - e-bar-MI)^^2), where e-I is the coefficient estimate for imputed data set i and e-bar-MI is the mean of coefficients across imputed data sets).


Sorry for the awkward notation. You can get a much prettier version of these equations in pp. 108-109 of:


Raghunathan T.E. (2004). "What do we do with missing data? Some options for analysis of incomplete data", Annual Review of Public Health, 25, 99-117.


I see that others have already suggested implementations in software packages. That great news: I've always done it manually in a spreadsheet program!


Hope this helps,

David



At 09:38 AM 11/11/2009, David Judkins wrote:
Well, I think this is the first question to the group since list ownership changed. I wonder how many people are signed up now? It hasn't been a very active list for a long time. Anyway, here is my question.

I have a dataset with multiple imputations. It is from a five-arm GRT. One arm is a control and the other four are active. I want to test for variation in mean responses across the four active arms. Proc Mixed will give me a test statistic based on each multiple imputation. But how do I combine these?

One of colleagues found something in the HLM manual that would suggest that the replicates of test statistics other than t-statistics are averaged with no attention paid to the variability among them. Sound accurate about HLM? Is that the best we can do?



David Judkins
Senior Statistician
Westat
1650 Research Boulevard
Rockville, MD 20850
(301) 315-5970
[email protected]


==============================
David Crow
Associate Director
Survey Research Center
University of California, Riverside
900 University Avenue
1419 Spieth Hall
Riverside, CA  92521
Tel.:  (951) 827-4028
Fax:  (951) 827-4035
Web: survey.ucr.edu
==============================

"It is the mark of an educated mind to rest satisfied with the degree of precision which the nature of the subject admits and not to seek exactness where only an approximation is possible." Aristotle

Reply via email to