> Does anyone know whether interaction terms (for categorical variables) > should be included in the multiple imputation process, versus > just being created after the dataset has been imputed? I have been > taking the latter approach out of concern about potential internal > inconsistencies in the data (e.g. separately imputed interaction term > is imputed as a "1" when one of > the individual effect terms is imputed as a "0" for the same > observation).
Craig, if your complete-data model includes interactions, then I would say that your imputation model should also have them. Omitting them during imputation will bias the interaction term towards zero. The amount of bias depends on the amount of missing data. As you noticed, consistency problems can occur, especially if you use a variable-by-variable imputation approach. There are several routes you could follow. You may opt for loglinear modelling of the entire distribution including any interactions you like, as in Schafer's cat approach. Another possibility is to add the interaction term to the predictor set, and make sure that it is updated as soon as the original variable is imputed. The latter is possible with the 'passive option' in MICE. See pages 12-13 of http://web.inter.nl.net/users/S.van.Buuren/mi/docs/Manual.pdf for examples. Best, Stef van Buuren. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. **********************************************************************
