Still working on our multiple imputation analysis and appreciate any
comments and guidance . . . .  We imputed data under the multivariate
normal model with SAS PROC MI MCMC for a covariate (X2, a medical test)
that is 20% missing.  We reasoned that because X2 was missing for
administrative reasons, e.g., because the patient was discharged from
the hospital before the research staff was able to obtain X2, the
argument could be made the data were MAR.   The analysis of interest is
a Cox regression model of time to death.  We put a lot of thought into
building the imputation model and were careful to include other
covariates that were highly correlated with X2 and all those that we
want in the analysis model (note: did not include time to death because
of censoring and not MVN).  We used 50 imputations for a dataset of
n=682, which is probably overkill, but we have the computing power and
disk storage.  

 

The result: the regression coefficient for the covariate of interest
(X1) was actually lower (ie. closer to zero) in the MI analysis than in
the complete case analysis.  This result was completely unexpected.
What we expected was simply a more efficient estimate of X1, which had
p=0.10 in the complete case analysis.   

 

To explore these results we did the following:

 

1.  Ran the Cox regression on the subset of patients missing X2 and
found that the relationship of X1 to outcome was in the opposite
direction to the complete cases, ie. X1 was protective in the patients
missing X2, while previous research and theory hold that X1 is a risk
factor.  

 

2.  Compared the characteristics of patients missing and not missing X2
and found a mixed bag as far as prognosis, although the patients in the
missing group had some important characteristics that conveyed better
prognosis (eg. younger) (I understand this as a test of MCAR
assumption).  

 

3.  Examined the imputed values of X2 and found the mean was slightly
lower than the observed X2 values.  Lower X2 is usually associated with
worse outcome.  

 

4.  Did a sensitivity analysis by adding and subtracting constants from
the imputed values and found that the resulting MI analyses were more in
line with our hypothesis (ie. statistically significant harmful effect
of X1) when the imputed values of X2 are inflated.  

 

Lingering questions

 

1.  Where do we go from here?!  Certainly, we feel that the MI analysis
is "better" in some sense than the complete case analysis (eg. uses all
the data), even though the results do not provide statistical support
for our hypothesis (ie. not "statistically significant" at p<.05).  

 

2.  We are confused by the fact that the patients missing X2 have
characteristics that are associated with better prognosis yet the
imputed values of X2 are lower than the observed data, which implies a
worse prognosis.  Is this something to worry about?  

 

3.  Is there any justification for reversing our initial thoughts about
the MAR assumption and now argue for MNAR, eg., because the patients in
the missing X2 group have important differences from the fully observed
patients, there might be unmeasured covariates that independently depend
on X2mis?  

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040114/16b77444/attachment.htm
From zaslavsk <@t> hcp.med.harvard.edu  Thu Jan 15 07:37:09 2004
From: zaslavsk <@t> hcp.med.harvard.edu (Alan Zaslavsky)
Date: Sun Jun 26 08:25:01 2005
Subject: IMPUTE: thinking about MAR
Message-ID: <[email protected]>


> Subject: IMPUTE: The art of imputation: thinking about MAR
> Date: Wed, 14 Jan 2004 10:09:16 -0600
> From: "Howells, William" <[email protected]>
>
> We put a lot of thought into
> building the imputation model and were careful to include other
> covariates that were highly correlated with X2 and all those that we
> want in the analysis model (note: did not include time to death because
> of censoring and not MVN). 

This is the feature of the analysis that most concerns me.  Essentially
what this does is to assume conditional independence of X2 and time to
death given the other variables, which of course attenuates the
relationship when you analyze a dataset including both observed and
imputed values of X2.  (The impact of this on estimated effect of the
variable of interest X1 is not at all obvious, although you might be able
to figure it out from looking at relationships of X1 and X2, etc.)  I
appreciate that modeling missing data with censored survival data is
nonstandard and therefore messy (perhaps impossible to do "correctly"
with any available standard software), but you are better off including
this crucial relationship with some kind of approximate model than
leaving it out altogether.

To do this using PROC MI, one idea would be to create a few indicators
for survival for 3 months, 6 months, 9 months, etc.  (or whatever is
appropriate to the time scale of your disease process).  Censored
observations have missing indicators for the time points later than time
of censoring.  Then throw this into PROC MI.  You will not use the
imputed values of the missing indicators, but this is a mechanism for
using the censored survival data within an MVN imputation framework.
(There might be some obvious reason why this won't work, but try it and
see.) Of course this model is "wrong" but if higher X2 is actually
associated with better prognosis, then using the outcomes in predicting
X2 should help to predict this.

> 2.  We are confused by the fact that the patients missing X2 have
> characteristics that are associated with better prognosis yet the
> imputed values of X2 are lower than the observed data, which implies a
> worse prognosis.  Is this something to worry about?  

What the model (as you fit it) is using is the relationship between X2
and the other characteristics, not the relationship between X2 and outcomes.
So it is possible that X2 predicts better prognosis (conditional on
everything else) yet is associated with other characteristics that predict
worse prognosis.

> 3.  Is there any justification for reversing our initial thoughts about
> the MAR assumption and now argue for MNAR, eg., because the patients in
> the missing X2 group have important differences from the fully observed
> patients, there might be unmeasured covariates that independently depend
> on X2mis?  

MNAR can never be demonstrated statistically (by definition) unless the
model is identified by some other unverifiable assumptions, but if the
results under a good MAR model (including survival outcomes) are
scientifically implausible then it is reasonable to want to think hard
about whether the process is MNAR.  What is actually going on when the
patient is discharged without measurement of X2?  Is this because the
recovery was unusually quick, or because the clinician didn't want to
subject a very sick patient to the test?  If you put missingness of X2
into the model as a predictor (instead of X2 itself), is it associated
with survival?  What can the clinicians tell you about what is going on?

These are challenging problems ... good luck with your analysis.

Reply via email to