Still working on our multiple imputation analysis and appreciate any comments and guidance . . . . We imputed data under the multivariate normal model with SAS PROC MI MCMC for a covariate (X2, a medical test) that is 20% missing. We reasoned that because X2 was missing for administrative reasons, e.g., because the patient was discharged from the hospital before the research staff was able to obtain X2, the argument could be made the data were MAR. The analysis of interest is a Cox regression model of time to death. We put a lot of thought into building the imputation model and were careful to include other covariates that were highly correlated with X2 and all those that we want in the analysis model (note: did not include time to death because of censoring and not MVN). We used 50 imputations for a dataset of n=682, which is probably overkill, but we have the computing power and disk storage.
The result: the regression coefficient for the covariate of interest (X1) was actually lower (ie. closer to zero) in the MI analysis than in the complete case analysis. This result was completely unexpected. What we expected was simply a more efficient estimate of X1, which had p=0.10 in the complete case analysis. To explore these results we did the following: 1. Ran the Cox regression on the subset of patients missing X2 and found that the relationship of X1 to outcome was in the opposite direction to the complete cases, ie. X1 was protective in the patients missing X2, while previous research and theory hold that X1 is a risk factor. 2. Compared the characteristics of patients missing and not missing X2 and found a mixed bag as far as prognosis, although the patients in the missing group had some important characteristics that conveyed better prognosis (eg. younger) (I understand this as a test of MCAR assumption). 3. Examined the imputed values of X2 and found the mean was slightly lower than the observed X2 values. Lower X2 is usually associated with worse outcome. 4. Did a sensitivity analysis by adding and subtracting constants from the imputed values and found that the resulting MI analyses were more in line with our hypothesis (ie. statistically significant harmful effect of X1) when the imputed values of X2 are inflated. Lingering questions 1. Where do we go from here?! Certainly, we feel that the MI analysis is "better" in some sense than the complete case analysis (eg. uses all the data), even though the results do not provide statistical support for our hypothesis (ie. not "statistically significant" at p<.05). 2. We are confused by the fact that the patients missing X2 have characteristics that are associated with better prognosis yet the imputed values of X2 are lower than the observed data, which implies a worse prognosis. Is this something to worry about? 3. Is there any justification for reversing our initial thoughts about the MAR assumption and now argue for MNAR, eg., because the patients in the missing X2 group have important differences from the fully observed patients, there might be unmeasured covariates that independently depend on X2mis? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040114/16b77444/attachment.htm From zaslavsk <@t> hcp.med.harvard.edu Thu Jan 15 07:37:09 2004 From: zaslavsk <@t> hcp.med.harvard.edu (Alan Zaslavsky) Date: Sun Jun 26 08:25:01 2005 Subject: IMPUTE: thinking about MAR Message-ID: <[email protected]> > Subject: IMPUTE: The art of imputation: thinking about MAR > Date: Wed, 14 Jan 2004 10:09:16 -0600 > From: "Howells, William" <[email protected]> > > We put a lot of thought into > building the imputation model and were careful to include other > covariates that were highly correlated with X2 and all those that we > want in the analysis model (note: did not include time to death because > of censoring and not MVN). This is the feature of the analysis that most concerns me. Essentially what this does is to assume conditional independence of X2 and time to death given the other variables, which of course attenuates the relationship when you analyze a dataset including both observed and imputed values of X2. (The impact of this on estimated effect of the variable of interest X1 is not at all obvious, although you might be able to figure it out from looking at relationships of X1 and X2, etc.) I appreciate that modeling missing data with censored survival data is nonstandard and therefore messy (perhaps impossible to do "correctly" with any available standard software), but you are better off including this crucial relationship with some kind of approximate model than leaving it out altogether. To do this using PROC MI, one idea would be to create a few indicators for survival for 3 months, 6 months, 9 months, etc. (or whatever is appropriate to the time scale of your disease process). Censored observations have missing indicators for the time points later than time of censoring. Then throw this into PROC MI. You will not use the imputed values of the missing indicators, but this is a mechanism for using the censored survival data within an MVN imputation framework. (There might be some obvious reason why this won't work, but try it and see.) Of course this model is "wrong" but if higher X2 is actually associated with better prognosis, then using the outcomes in predicting X2 should help to predict this. > 2. We are confused by the fact that the patients missing X2 have > characteristics that are associated with better prognosis yet the > imputed values of X2 are lower than the observed data, which implies a > worse prognosis. Is this something to worry about? What the model (as you fit it) is using is the relationship between X2 and the other characteristics, not the relationship between X2 and outcomes. So it is possible that X2 predicts better prognosis (conditional on everything else) yet is associated with other characteristics that predict worse prognosis. > 3. Is there any justification for reversing our initial thoughts about > the MAR assumption and now argue for MNAR, eg., because the patients in > the missing X2 group have important differences from the fully observed > patients, there might be unmeasured covariates that independently depend > on X2mis? MNAR can never be demonstrated statistically (by definition) unless the model is identified by some other unverifiable assumptions, but if the results under a good MAR model (including survival outcomes) are scientifically implausible then it is reasonable to want to think hard about whether the process is MNAR. What is actually going on when the patient is discharged without measurement of X2? Is this because the recovery was unusually quick, or because the clinician didn't want to subject a very sick patient to the test? If you put missingness of X2 into the model as a predictor (instead of X2 itself), is it associated with survival? What can the clinicians tell you about what is going on? These are challenging problems ... good luck with your analysis.
