Here's a simplified version of my problem. I'd be grateful for any suggestions.
Say X1 is a dummy variable with missing values, and I intend to regress Y on X1, X2, and X1X2. The usual and sound advice is that both X1 and X1X2 need to be imputed. But in imputing X1 and X1X2 I run into collinearity problems because X1 explains 99% of the variation in X1X2. The reason is that X1X2 has no residual variation when X1=0. X1X2 does have substantial residual variation when X1=1, but cases with X1=1 make up only 15% of the data set. I've used two imputation programs -- MI and IVEware -- and neither can handle the collinearity. MI gives an error message and IVEware gives implausible imputations. Again, I'd be grateful for any suggestions. Thanks! Paul von Hippel
