Cc: [email protected] Subject: Imputation with regression The problem in imputation under a model with interactions is that the relationships are not linear, as is assumed under the normal model. Rubinn has commented (in a seminar, I don't know whether this is published) that you might just stick in the product of variables as another variable and then the imputation of the missing values would essentially use the linear approximation to the product. If the interaction is very important model, this might not be such a great approximation. Another approach might be that used in IVEware (from U of Michigan), which uses a collection of univariate regression models to impute in turn. This gives some additional flexibility in specifying each model, for example by including some interactions in predicting some variables from others, although the collection of models is likely not to be entirely consistent. Specifying a consistent joint model with interactions in the conditional (regression) models is not at all obvious.
On the centering of the variables in the interactions, you have given two perfectly reasonable alternatives. In one of them your published statement would be "the effect of X1 when X2 is fixed at its population mean value is Beta", while in the other you would say "the effect of X1 is Beta when X2 fixed at M", where M is the complete data mean. Both statements are reasonable and unlikely to differ much. I incline somewhat toward the latter since the estimated (but unknown) population mean has no particular importance while the effect at a fixed value that is close to the mean is readily interpretable.
