I came across a note from Hershberger and Fisher on the number of imputations (citation below), where they conclude that a much larger number of imputations is required (over 500 in some cases) than the usual rule of thumb that a relatively small number of imputations is needed (say 5 to 20 per Rubin 1987, Schafer 1997). They argue that the traditional rules of thumb are based on simulations rather than sampling theory. Their calculations assume that the number of imputations is a random variable from a uniform distribution and use a formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n is the number of imputations, z is a standard normal variable, V**2 is the squared coefficient of variation (~1.33) and e is the "amount of error, or the degree to which the predicted number of imputations differs from the optimal or "true" number of imputations". For example, with z=1.96 and e=.10, n=511 imputations are required.
I'm having difficulty conceiving of the number of imputations as a random variable. What does "true" number of imputations mean? Is this argument legitimate? Should I be using 500 imputations instead of 5? Bill Howells, MS Behavioral Medicine Center Washington University School of Medicine St Louis, MO Hershberger SL, Fisher DG (2003), Note on determining the number of imputations for missing data, Structural Equation Modeling, 10(4): 648-650. http://www.leaonline.com/loi/sem -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040219/31be6e43/attachment.htm From rubin <@t> stat.harvard.edu Thu Feb 19 10:18:56 2004 From: rubin <@t> stat.harvard.edu (Donald Rubin) Date: Sun Jun 26 08:25:01 2005 Subject: IMPUTE: Re: number imputations recommended by Hershberger and Fisher In-Reply-To: <2ada428b6944da4b8f8a2fdf4e60e52a01f...@exchange.wusm-pcf.wustl.edu> References: <2ada428b6944da4b8f8a2fdf4e60e52a01f...@exchange.wusm-pcf.wustl.edu> Message-ID: <[email protected]> I'm baffled too on both counts. Modest numbers of imputations work fine unless the fractions of missing information are very high (> 50%), and then I wouldn't think of those situations as missing data problems except in a formal sense. And the number of them is a random variable??? I guess we'll have to read what they wrote... On Thu, 19 Feb 2004, Howells, William wrote: > I came across a note from Hershberger and Fisher on the number of > imputations (citation below), where they conclude that a much larger > number of imputations is required (over 500 in some cases) than the > usual rule of thumb that a relatively small number of imputations is > needed (say 5 to 20 per Rubin 1987, Schafer 1997). They argue that the > traditional rules of thumb are based on simulations rather than sampling > theory. Their calculations assume that the number of imputations is a > random variable from a uniform distribution and use a formula from Levy > and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n is the number of > imputations, z is a standard normal variable, V**2 is the squared > coefficient of variation (~1.33) and e is the "amount of error, or the > degree to which the predicted number of imputations differs from the > optimal or "true" number of imputations". For example, with z=1.96 and > e=.10, n=511 imputations are required. > > > > I'm having difficulty conceiving of the number of imputations as a > random variable. What does "true" number of imputations mean? Is this > argument legitimate? Should I be using 500 imputations instead of 5? > > > > Bill Howells, MS > > Behavioral Medicine Center > > Washington University School of Medicine > > St Louis, MO > > > > Hershberger SL, Fisher DG (2003), Note on determining the number of > imputations for missing data, Structural Equation Modeling, 10(4): > 648-650. > > > > http://www.leaonline.com/loi/sem > > > > -- Donald B. Rubin John L. Loeb Professor of Statistics Chairman Department of Statistics Harvard University Cambridge MA 02138 Tel: 617-495-5498 Fax: 617-496-8057
