Hi Paul,


On Thu, 19 Feb 2004, Paul von Hippel wrote:

> Rubin (1987, Table 4.2) shows that even with .9 missing information, 
> confidence intervals using as few as 5 imputations will have close to their 
> nominal level of coverage. But increasing M beyond 5 has benefits 
> nonetheless. It increases df, narrowing confidence intervals while 
> maintaining their coverage levels.
> 

Right.


> A while back, I simulated 10,000 observations where X1 and X2 were complete 
> and independent, and half the Y values were missing as a function of X1. 
> Because of the high missingness, the regression parameters had only 11-16 
> df, even when I used M=20 imputations.
> 

Depends on the relationship between Y and X1 and the way the Y values are
missing.  Sounds like you created a large fraction of missing info, so my
guess is that you deleted essentially all of the Ys with any leverage for
estimating the slope on X1, i.e., all Y with X1s far from its mean, and 
therefore also depends on the distn of X1 etc.  Was it all gaussian?


> This struck me as odd, since when only Y is missing, and missing at random, 
> maximum likelihood regression estimates are the same as those obtained from 
> listwise deletion. The listwise estimates would have ~5000 df, and it seems 
> strange that the MI df would be so much lower.
> 

Still seems a bit odd, but that would depend on the specific distns and 
missingness rule.

Best, Don


> Best wishes,
> Paul von Hippel
> 
> At 11:31 AM 2/19/2004, Paul Allison wrote:
> >Some further thoughts:
> >
> >1. The arguments I've seen for using around five imputations are based
> >on efficiency calculations for the parameter estimates.  But what about
> >standard errors and p-values?  I've found them to be rather unstable for
> >moderate to large fractions of missing information.
> >
> >2. Joe Schafer told me several months ago that he had a dissertation
> >student whose work showed that substantially larger numbers of
> >imputations were often required for good inference.  But I don't know
> >any of the details.
> >
> >3. For these reasons, I've adopted the following rule of thumb: Do a
> >sufficient number of imputations to get the estimated DF over 100 for
> >all parameters of interest.  I'd love to know what others think of this.
> >
> >
> >----------------------------------------------------------------
> >Paul D. Allison, Professor & Chair
> >Department of Sociology
> >University of Pennsylvania
> >3718 Locust Walk
> >Philadelphia, PA  19104-6299
> >voice: 215-898-6717 or 215-898-6712
> >fax: 215-573-2081
> >[email protected]
> >http://www.ssc.upenn.edu/~allison
> >
> >
> >
> >
> >
> >I'm baffled too on both counts.  Modest numbers of imputations work fine
> >unless the fractions of missing information are very high (> 50%), and
> >then I wouldn't think of those situations as missing data problems
> >except in a formal sense.  And the number of them is a random
> >variable???  I
> >guess we'll have to read what they wrote...
> >
> >
> >
> >On Thu, 19 Feb 2004, Howells, William wrote:
> >
> > > I came across a note from Hershberger and Fisher on the number of
> > > imputations (citation below), where they conclude that a much larger
> > > number of imputations is required (over 500 in some cases) than the
> > > usual rule of thumb that a relatively small number of imputations is
> > > needed (say 5 to 20 per Rubin 1987, Schafer 1997).  They argue that
> > > the traditional rules of thumb are based on simulations rather than
> > > sampling theory.  Their calculations assume that the number of
> > > imputations is a random variable from a uniform distribution and use a
> >
> > > formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n
> > > is the number of imputations, z is a standard normal variable, V**2 is
> >
> > > the squared coefficient of variation (~1.33) and e is the "amount of
> > > error, or the degree to which the predicted number of imputations
> > > differs from the optimal or "true" number of imputations".  For
> > > example, with z=1.96 and e=.10, n=511 imputations are required.
> > >
> > >
> > >
> > > I'm having difficulty conceiving of the number of imputations as a
> > > random variable.  What does "true" number of imputations mean?  Is
> > > this argument legitimate?  Should I be using 500 imputations instead
> >of 5?
> > >
> > >
> > >
> > > Bill Howells, MS
> > >
> > > Behavioral Medicine Center
> > >
> > > Washington University School of Medicine
> > >
> > > St Louis, MO
> > >
> > >
> > >
> > > Hershberger SL, Fisher DG (2003), Note on determining the number of
> > > imputations for missing data, Structural Equation Modeling, 10(4):
> > > 648-650.
> > >
> > >
> > >
> > > http://www.leaonline.com/loi/sem
> > >
> > >
> > >
> > >
> >
> >--
> >Donald B. Rubin
> >John L. Loeb Professor of Statistics
> >Chairman Department of Statistics
> >Harvard University
> >Cambridge MA 02138
> >Tel: 617-495-5498  Fax: 617-496-8057
> 
> Paul von Hippel
> Department of Sociology / Initiative in Population Research
> Ohio State University
> 300 Bricker Hall
> 190 N. Oval Mall
> Columbus OH 43210
> 614 688-3768
> 
> 

-- 
Donald B. Rubin
John L. Loeb Professor of Statistics
Chairman Department of Statistics
Harvard University
Cambridge MA 02138
Tel: 617-495-5498  Fax: 617-496-8057

Reply via email to