If you consider the high values as "wrong", as arising from an infection,
it makes sense to delete them and impute if there is information added by
including the cases. Omitting them is also subject to bias, of course.

In principle if you had repeated measures of normal and infected values,
you could use the relationship to impute normal given infected. But that is
likely a hypothetical situation I think.

Regarding FIML versus Bayesian MI, as Alan notes MI can be more efficient
if it includes useful auxiliary variables not in the FIML analysis.
Otherwise Bayesian MI under the same model is asymptotically equivalent to
FIML as the number of multiple imputes M increases, and can get pretty
close to being fully efficient for a small M if the fraction of missing
information is modest.

Another often ignored aspect of FIML is that it is essentially estimating a
posterior mode under a flat prior, whereas MI is more like a posterior mean
(at least in terms of how it is dealing with missing data). When the
posterior distribution is not symmetric (as with a binary variable with
proportion close to zero or one) this distinction matters; one thing I like
about Bayes is that it makes the choice of loss function more front and
center, since it impacts how the posterior distribution is being summarized.

There is a small literature on edit imputation that is relevant to this
question. I have an early paper on ML in JASA 1987 with Phil Smith, and Joe
Schafer and Bonnie Ghosh-Dattar has a paper on MI in this context.

Rod Little

Rod Little
Richard D. Remington Distinguished University Professor
Department of Biostatistics
University of Michigan
M4071 SPH II, 1415 Washington Heights
Ann Arbor MI 48109
[email protected]


On Fri, Mar 28, 2014 at 9:58 AM, Zaslavsky, Alan M. <
[email protected]> wrote:

>  Yes, that's another consideration about the use of the data I should
> have mentioned.  If a sometimes-missing variable is the outcome in a
> regression model, the best you can do is to impute it under a correct model
> and then rediscover that model in the analysis of the data including
> imputed values.  If the same variables and model specification are used for
> imputation and analysis, nothing is added:  you do about the same with or
> without imputation.  If you have an imputation model that brings in
> additional information, you might gain by using the imputed values.  (For a
> dumb example, suppose Y=weight in pounds, which is sometimes missing but
> your imputation model can make use of complete data on weight in grams.)
> On the other hand if you impute under a 'bad' model (uncongenial with your
> analysis, omitting important analytic predictors) you might bias your
> results.
>
> In practice we don't always think through these considerations if the
> imputation model is pretty good and the analysis is complex with the same
> variables appearing multiple times, sometimes as predictors, outcomes, or
> in univariate descriptions.
>
>  ------------------------------
> *From:* Impute -- Imputations in Data Analysis [
> [email protected]] On Behalf Of Jonathan Mohr [
> [email protected]]
> *Sent:* Friday, March 28, 2014 8:21 AM
> *To:* [email protected]
> *Subject:* Re: Impute invalid data?
>
>    Alan,
>  Our situation is the latter you identified, where we "interested in each
> person's "normal" uninfected CRP level but it is missing for some because
> they were infected temporarily when the measure was taken." And my thought
> was the same as yours: that we "might discard the uninformative data on
> levels when infected and impute missing "normal" values."
>
>  This said, you may have seen Paul von Hippel's message post regarding his
> study providing evidence that using data from participants with imputed Y
> values for multiple regression (in a multiple imputation analysis) may not
> be a particularly good strategy. When there are missing values on both Xs
> and Y, he recommends (a) creating multiple imputed datasets using all
> available data but then (b) dropping data from all cases with missing Y
> values for the actual analysis.
>
>  I asked him how this method compares to FIML missing data methods, and he
> pointed me to another of his publications suggesting that FIML outperforms
> all MI strategies when there are missing values (in terms of both
> efficiency and bias).
> Jon
>
>

Reply via email to