I don't see your simulation results as odd at all.  When only the
dependent variable has missing data and there are no "auxiliary"
variables, listwise deletion is optimal (it's the ML estimate), and
certainly superior to multiple imputation. The imputation process
introduces additional random variation into the estimates and, as you
discovered in your simulation, that additional variation can be
substantial

----------------------------------------------------------------
Paul D. Allison, Professor & Chair
Department of Sociology
University of Pennsylvania
3718 Locust Walk
Philadelphia, PA  19104-6299
voice: 215-898-6717 or 215-898-6712
fax: 215-573-2081
[email protected]
http://www.ssc.upenn.edu/~allison
 


-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Paul von Hippel
Sent: Thursday, February 19, 2004 12:31 PM
To: [email protected]
Subject: IMPUTE: Re: number imputations recommended by Hershberger and
Fisher


Rubin (1987, Table 4.2) shows that even with .9 missing information, 
confidence intervals using as few as 5 imputations will have close to
their 
nominal level of coverage. But increasing M beyond 5 has benefits 
nonetheless. It increases df, narrowing confidence intervals while 
maintaining their coverage levels.

A while back, I simulated 10,000 observations where X1 and X2 were
complete 
and independent, and half the Y values were missing as a function of X1.

Because of the high missingness, the regression parameters had only
11-16 
df, even when I used M=20 imputations.

This struck me as odd, since when only Y is missing, and missing at
random, 
maximum likelihood regression estimates are the same as those obtained
from 
listwise deletion. The listwise estimates would have ~5000 df, and it
seems 
strange that the MI df would be so much lower.

Best wishes,
Paul von Hippel

At 11:31 AM 2/19/2004, Paul Allison wrote:
>Some further thoughts:
>
>1. The arguments I've seen for using around five imputations are based 
>on efficiency calculations for the parameter estimates.  But what about

>standard errors and p-values?  I've found them to be rather unstable 
>for moderate to large fractions of missing information.
>
>2. Joe Schafer told me several months ago that he had a dissertation 
>student whose work showed that substantially larger numbers of 
>imputations were often required for good inference.  But I don't know 
>any of the details.
>
>3. For these reasons, I've adopted the following rule of thumb: Do a 
>sufficient number of imputations to get the estimated DF over 100 for 
>all parameters of interest.  I'd love to know what others think of 
>this.
>
>
>----------------------------------------------------------------
>Paul D. Allison, Professor & Chair
>Department of Sociology
>University of Pennsylvania
>3718 Locust Walk
>Philadelphia, PA  19104-6299
>voice: 215-898-6717 or 215-898-6712
>fax: 215-573-2081
>[email protected]
>http://www.ssc.upenn.edu/~allison
>
>
>
>
>
>I'm baffled too on both counts.  Modest numbers of imputations work 
>fine unless the fractions of missing information are very high (> 50%),

>and then I wouldn't think of those situations as missing data problems 
>except in a formal sense.  And the number of them is a random 
>variable???  I guess we'll have to read what they wrote...
>
>
>
>On Thu, 19 Feb 2004, Howells, William wrote:
>
> > I came across a note from Hershberger and Fisher on the number of 
> > imputations (citation below), where they conclude that a much larger

> > number of imputations is required (over 500 in some cases) than the 
> > usual rule of thumb that a relatively small number of imputations is

> > needed (say 5 to 20 per Rubin 1987, Schafer 1997).  They argue that 
> > the traditional rules of thumb are based on simulations rather than 
> > sampling theory.  Their calculations assume that the number of 
> > imputations is a random variable from a uniform distribution and use

> > a
>
> > formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where 
> > n is the number of imputations, z is a standard normal variable, 
> > V**2 is
>
> > the squared coefficient of variation (~1.33) and e is the "amount of

> > error, or the degree to which the predicted number of imputations 
> > differs from the optimal or "true" number of imputations".  For 
> > example, with z=1.96 and e=.10, n=511 imputations are required.
> >
> >
> >
> > I'm having difficulty conceiving of the number of imputations as a 
> > random variable.  What does "true" number of imputations mean?  Is 
> > this argument legitimate?  Should I be using 500 imputations instead
>of 5?
> >
> >
> >
> > Bill Howells, MS
> >
> > Behavioral Medicine Center
> >
> > Washington University School of Medicine
> >
> > St Louis, MO
> >
> >
> >
> > Hershberger SL, Fisher DG (2003), Note on determining the number of 
> > imputations for missing data, Structural Equation Modeling, 10(4): 
> > 648-650.
> >
> >
> >
> > http://www.leaonline.com/loi/sem
> >
> >
> >
> >
>
>--
>Donald B. Rubin
>John L. Loeb Professor of Statistics
>Chairman Department of Statistics
>Harvard University
>Cambridge MA 02138
>Tel: 617-495-5498  Fax: 617-496-8057

Paul von Hippel
Department of Sociology / Initiative in Population Research Ohio State
University 300 Bricker Hall 190 N. Oval Mall Columbus OH 43210 614
688-3768


Reply via email to