A few months ago, I posted a note asking how to estimate R^2 (and other 
quantities) when values are multiply imputed. A respondent suggested that I 
use the same strategy as that used to estimate the regression coefficients: 
get a point estimate from each imputed data set, and average these.

Today I began to wonder about this. Consider the regression Y=rX+e where X 
and Y are standard normal variables. Then R^2 = r^2. It was suggested that 
R^2 could be estimated by averaging the estimates of R^2=r^2 across 
multiple imputations. Yet r is estimated by averaging the estimates of r 
across multiple imputations. In general, these estimates will not agree: if 
r>0, then the estimate of R^2 will be less than the squared estimate of r. 
If the estimator of r is unbiased, then the proposed estimate of R^2 must 
be biased.

It strikes me there must be a lot of quantities for which we cannot obtain 
unbiased estimates using this procedure. Pertinent citations would be most 
appreciated.

Best wishes,
Paul von Hippel
Statistician
Ohio State University

Reply via email to