Hi Paul,
On Thu, 19 Feb 2004, Paul von Hippel wrote: > Rubin (1987, Table 4.2) shows that even with .9 missing information, > confidence intervals using as few as 5 imputations will have close to their > nominal level of coverage. But increasing M beyond 5 has benefits > nonetheless. It increases df, narrowing confidence intervals while > maintaining their coverage levels. > Right. > A while back, I simulated 10,000 observations where X1 and X2 were complete > and independent, and half the Y values were missing as a function of X1. > Because of the high missingness, the regression parameters had only 11-16 > df, even when I used M=20 imputations. > Depends on the relationship between Y and X1 and the way the Y values are missing. Sounds like you created a large fraction of missing info, so my guess is that you deleted essentially all of the Ys with any leverage for estimating the slope on X1, i.e., all Y with X1s far from its mean, and therefore also depends on the distn of X1 etc. Was it all gaussian? > This struck me as odd, since when only Y is missing, and missing at random, > maximum likelihood regression estimates are the same as those obtained from > listwise deletion. The listwise estimates would have ~5000 df, and it seems > strange that the MI df would be so much lower. > Still seems a bit odd, but that would depend on the specific distns and missingness rule. Best, Don > Best wishes, > Paul von Hippel > > At 11:31 AM 2/19/2004, Paul Allison wrote: > >Some further thoughts: > > > >1. The arguments I've seen for using around five imputations are based > >on efficiency calculations for the parameter estimates. But what about > >standard errors and p-values? I've found them to be rather unstable for > >moderate to large fractions of missing information. > > > >2. Joe Schafer told me several months ago that he had a dissertation > >student whose work showed that substantially larger numbers of > >imputations were often required for good inference. But I don't know > >any of the details. > > > >3. For these reasons, I've adopted the following rule of thumb: Do a > >sufficient number of imputations to get the estimated DF over 100 for > >all parameters of interest. I'd love to know what others think of this. > > > > > >---------------------------------------------------------------- > >Paul D. Allison, Professor & Chair > >Department of Sociology > >University of Pennsylvania > >3718 Locust Walk > >Philadelphia, PA 19104-6299 > >voice: 215-898-6717 or 215-898-6712 > >fax: 215-573-2081 > >[email protected] > >http://www.ssc.upenn.edu/~allison > > > > > > > > > > > >I'm baffled too on both counts. Modest numbers of imputations work fine > >unless the fractions of missing information are very high (> 50%), and > >then I wouldn't think of those situations as missing data problems > >except in a formal sense. And the number of them is a random > >variable??? I > >guess we'll have to read what they wrote... > > > > > > > >On Thu, 19 Feb 2004, Howells, William wrote: > > > > > I came across a note from Hershberger and Fisher on the number of > > > imputations (citation below), where they conclude that a much larger > > > number of imputations is required (over 500 in some cases) than the > > > usual rule of thumb that a relatively small number of imputations is > > > needed (say 5 to 20 per Rubin 1987, Schafer 1997). They argue that > > > the traditional rules of thumb are based on simulations rather than > > > sampling theory. Their calculations assume that the number of > > > imputations is a random variable from a uniform distribution and use a > > > > > formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n > > > is the number of imputations, z is a standard normal variable, V**2 is > > > > > the squared coefficient of variation (~1.33) and e is the "amount of > > > error, or the degree to which the predicted number of imputations > > > differs from the optimal or "true" number of imputations". For > > > example, with z=1.96 and e=.10, n=511 imputations are required. > > > > > > > > > > > > I'm having difficulty conceiving of the number of imputations as a > > > random variable. What does "true" number of imputations mean? Is > > > this argument legitimate? Should I be using 500 imputations instead > >of 5? > > > > > > > > > > > > Bill Howells, MS > > > > > > Behavioral Medicine Center > > > > > > Washington University School of Medicine > > > > > > St Louis, MO > > > > > > > > > > > > Hershberger SL, Fisher DG (2003), Note on determining the number of > > > imputations for missing data, Structural Equation Modeling, 10(4): > > > 648-650. > > > > > > > > > > > > http://www.leaonline.com/loi/sem > > > > > > > > > > > > > > > >-- > >Donald B. Rubin > >John L. Loeb Professor of Statistics > >Chairman Department of Statistics > >Harvard University > >Cambridge MA 02138 > >Tel: 617-495-5498 Fax: 617-496-8057 > > Paul von Hippel > Department of Sociology / Initiative in Population Research > Ohio State University > 300 Bricker Hall > 190 N. Oval Mall > Columbus OH 43210 > 614 688-3768 > > -- Donald B. Rubin John L. Loeb Professor of Statistics Chairman Department of Statistics Harvard University Cambridge MA 02138 Tel: 617-495-5498 Fax: 617-496-8057
