Yes, that's another consideration about the use of the data I should have mentioned. If a sometimes-missing variable is the outcome in a regression model, the best you can do is to impute it under a correct model and then rediscover that model in the analysis of the data including imputed values. If the same variables and model specification are used for imputation and analysis, nothing is added: you do about the same with or without imputation. If you have an imputation model that brings in additional information, you might gain by using the imputed values. (For a dumb example, suppose Y=weight in pounds, which is sometimes missing but your imputation model can make use of complete data on weight in grams.) On the other hand if you impute under a 'bad' model (uncongenial with your analysis, omitting important analytic predictors) you might bias your results.
In practice we don't always think through these considerations if the imputation model is pretty good and the analysis is complex with the same variables appearing multiple times, sometimes as predictors, outcomes, or in univariate descriptions. ________________________________ From: Impute -- Imputations in Data Analysis [[email protected]] On Behalf Of Jonathan Mohr [[email protected]] Sent: Friday, March 28, 2014 8:21 AM To: [email protected] Subject: Re: Impute invalid data? Alan, Our situation is the latter you identified, where we "interested in each person's "normal" uninfected CRP level but it is missing for some because they were infected temporarily when the measure was taken." And my thought was the same as yours: that we "might discard the uninformative data on levels when infected and impute missing "normal" values." This said, you may have seen Paul von Hippel's message post regarding his study providing evidence that using data from participants with imputed Y values for multiple regression (in a multiple imputation analysis) may not be a particularly good strategy. When there are missing values on both Xs and Y, he recommends (a) creating multiple imputed datasets using all available data but then (b) dropping data from all cases with missing Y values for the actual analysis. I asked him how this method compares to FIML missing data methods, and he pointed me to another of his publications suggesting that FIML outperforms all MI strategies when there are missing values (in terms of both efficiency and bias). Jon
