I'm seeking information on the use of auxiliary variables in multiple imputation -- that is, using variables in the imputation that won't be part of the analysis. For example, suppose I intend to regress Y on X where (X,Y) are bivariate normal with ignorable missingness. Consider two choices in specifying the imputation model.
(1) I could impute under the assumption that (X,Y) is bivariate normal. (2) Or I could impute under the assumption that (X,Y,Z) is trivariate normal where Z is a normal variable that explains some extra variance in X or Y. Either choice -- (1) or (2) -- will lead to confidence-valid regression estimates. But under which choice will the regression estimates be more efficient, and why? I'm guessing that the answer depends on how good a predictor Z is. Meng (1994) and Schafer (1997) discuss the issue of "uncongenial" input in general, but I don't think they answer my more specific question. Any references most appreciated. Best wishes, Paul von Hippel
