Hi, I have just finished simulating longitudinal data with different proportions of MAR & NINR data and trying to see which method of analysis provides estimates with the smallest bias, and smallest MSE as part of my doctoral dissertation. The NINR data was simulated using a shared parameter model. The results so far seem to suggest that Proc Mixed performs the best when compared to complete case analysis, and a new method I tried, which separated the data set into its complete case, MAR & NINR components, estimated the parameters separately & them combined the estimates using weights that approximate variances. The results also indicate that depending on the parameters of the NINR distribution, the mean slope of the overall data set gets shifted from the simulated values. Thus, if one analyzes the data before creating the missing values, the estimated treatment slope is not the simulated value of 2 (in my case) but under it or over it (1.75, 2.0, or 2.25) depending on the parameter of my drop out distribution. If I analyze the entire data set (after creating the missing), MAR & NINR missing & complete case combined, then my slope is closer to the value I would get if the data were not missing, but had some data whose pdf was a product of the normal distribution & the drop out distribution, i.e. the 1.75, 2.0 or 2.25 above but not the two. On the other hand if I analyze just the complete case data or a data set that has a mixture of the complete case & MAR data, then my treatment slope remains close to the simulated value of 2.
I would like to suggest then that one way to test if the missing data has any NINR in it, is to analyze the entire data set, missing & complete case, using procedures such as Proc Mixed, that allow unbalanced data sets. Then analyze only the complete cases. If the two treatment slopes are significantly different from one another, then there is a good chance that at least some of the missing data has a missing data distribution that affects the parameter estimate. I tried this on data sets with just 1/3 of missing data NINR, and those with as much as 2/3 missing data NINR. In all cases a simple 2 sample t test, shows significant difference between the treatment slopes when there is NINR data involved (p < .0001), and insignificant difference when there is only MAR data involved (p > .25). I hope this explanation is clear, because I would like to know if anyone can see a flaw in my logic that I am not able to see? thank you shreelatha
