Yes, that's another consideration about the use of the data I should have 
mentioned.  If a sometimes-missing variable is the outcome in a regression 
model, the best you can do is to impute it under a correct model and then 
rediscover that model in the analysis of the data including imputed values.  If 
the same variables and model specification are used for imputation and 
analysis, nothing is added:  you do about the same with or without imputation.  
If you have an imputation model that brings in additional information, you 
might gain by using the imputed values.  (For a dumb example, suppose Y=weight 
in pounds, which is sometimes missing but your imputation model can make use of 
complete data on weight in grams.)  On the other hand if you impute under a 
'bad' model (uncongenial with your analysis, omitting important analytic 
predictors) you might bias your results.

In practice we don't always think through these considerations if the 
imputation model is pretty good and the analysis is complex with the same 
variables appearing multiple times, sometimes as predictors, outcomes, or in 
univariate descriptions.

________________________________
From: Impute -- Imputations in Data Analysis 
[[email protected]] On Behalf Of Jonathan Mohr [[email protected]]
Sent: Friday, March 28, 2014 8:21 AM
To: [email protected]
Subject: Re: Impute invalid data?

Alan,
Our situation is the latter you identified, where we "interested in each 
person's "normal" uninfected CRP level but it is missing for some because they 
were infected temporarily when the measure was taken." And my thought was the 
same as yours: that we "might discard the uninformative data on levels when 
infected and impute missing "normal" values."

This said, you may have seen Paul von Hippel's message post regarding his study 
providing evidence that using data from participants with imputed Y values for 
multiple regression (in a multiple imputation analysis) may not be a 
particularly good strategy. When there are missing values on both Xs and Y, he 
recommends (a) creating multiple imputed datasets using all available data but 
then (b) dropping data from all cases with missing Y values for the actual 
analysis.

I asked him how this method compares to FIML missing data methods, and he 
pointed me to another of his publications suggesting that FIML outperforms all 
MI strategies when there are missing values (in terms of both efficiency and 
bias).
Jon

Reply via email to