Tim Hesterberg, a statistician at Insightful Corporations (makers of S-Plus) had given me some excellent notes motivating why Y must be used. I have put these in the Multiple Imputation section of the following web page:
http://hesweb1.med.virginia.edu/biostat/rms His notes include some S code to demonstrate what he's talking about. Frank Harrell On Wed, 22 May 2002 16:17:57 -0400 Constantine Daskalakis <[email protected]> wrote: > Hi. > > I have a regression of Y on a bunch of Xs (always observed) and on Z > (sometimes missing). > > The X's will be used to impute Z. But should Y also be used in imputing Z? > > My reading of the literature suggests that's not a problem and can often be > a good thing in terms of gaining precision. A colleague argues that using > the outcome to impute the predictor, will bias the estimated effect of that > predictor in the main regression model. She argues that, by using Y, > "you're stacking the deck, so to speak", ie, the imputation determines what > you'll find out in the main regression model. > > Is there a heuristic response to that concern? > (Or, if I'm wrong, please someone correct me!) > > Thanks, > cd > > PS Always assuming MAR of Z (ie, missingness of Z does not depend on the > unobserved Z itself). > > > > ________________________________________________________________ > > Constantine Daskalakis, ScD > Assistant Professor, > Biostatistics Section, Thomas Jefferson University, > 125 S. 9th St. #402, Philadelphia, PA 19107 > Tel: 215-955-5695 > Fax: 215-503-3804 > Email: [email protected] > Webpage: http://www.kcc.tju.edu/Science/SharedFacilities/Biostatistics > > -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
