I'm seeking information on the use of auxiliary variables in multiple 
imputation -- that is, using variables in the imputation that won't be part 
of the analysis. For example, suppose I intend to regress Y on X where 
(X,Y) are bivariate normal with ignorable missingness. Consider two choices 
in specifying the imputation model.

(1) I could impute under the assumption that (X,Y) is bivariate normal.
(2) Or I could impute under the assumption that (X,Y,Z) is trivariate 
normal where Z is a normal variable that explains some extra variance in X 
or Y.

Either choice -- (1) or (2) -- will lead to confidence-valid regression 
estimates. But under which choice will the regression estimates be more 
efficient, and why? I'm guessing that the answer depends on how good a 
predictor Z is.

Meng (1994) and Schafer (1997) discuss the issue of "uncongenial" input in 
general, but I don't think they answer my more specific question. Any 
references most appreciated.

Best wishes,
Paul von Hippel

Reply via email to