Craig......interesting comment as I recall from the Cohen and Cohen regression text, that if data on the outcome variable is missing, one is generally out of luck, as any type of imputation/missing data technique is inappropriate.....in fact, this is the direct quote: "Because Y represents the outcome or effect of the IVs, when the Y value for a subject is not known, there is little that can be done in [regression] but drop that subject. One might regret the apparent loss of information in the IVs when the subject is dropped, but in the fixed model situation, there is no information lost, because the investigator has selected the combination of IV values to be studied" (pp. 275-276)........do you agree with this?....are there no missing data options when the missingness is associated with the outcome variable?.....thanks...dale
Dale N. Glaser, Ph.D. Pacific Science & Engineering Group 6310 Greenwich Drive; Suite 200 San Diego, CA 92122 Phone: (858) 535-1661 Fax: (858) 535-1665 http://www.pacific-science.com -----Original Message----- From: Craig D. Newgard [mailto:[email protected]] Sent: Thursday, May 23, 2002 1:43 PM To: [email protected]; [email protected] Subject: IMPUTE: Re: imputing covariates Constantine, Generally, I agree with your colleague. The danger in using an outcome variable in the imputation process is in creating associations between predictors and the outcome that would not be present otherwise. If this occurs (essentially creating confounding variables), your result could be biased. It is difficult to tell whether or not this occurs when the outcome is included in the imputation process, as differences in results between the non-imputed data and the imputed data could reflect a correction of the bias in using data restricted to non-missing values (the non-imputed dataset) or bias created from the imputation process, or both. The simplest way to deal with this is to leave out the outcome variable from the imputation process, as a benefit in precision may be offset by a loss of validity. Craig Craig D. Newgard, MD, MPH Research Fellow Department of Emergency Medicine Harbor-UCLA Medical Center 1000 West Carson Street, Box 21 Torrance, CA 90509 (310)222-3666 (Office) (310)782-1763 (Fax) [email protected] -----Original Message----- From: [email protected] [mailto:[email protected]]on Behalf Of Constantine Daskalakis Sent: Wednesday, May 22, 2002 1:18 PM To: [email protected] Cc: Constantine Daskalakis Subject: IMPUTE: imputing covariates Hi. I have a regression of Y on a bunch of Xs (always observed) and on Z (sometimes missing). The X's will be used to impute Z. But should Y also be used in imputing Z? My reading of the literature suggests that's not a problem and can often be a good thing in terms of gaining precision. A colleague argues that using the outcome to impute the predictor, will bias the estimated effect of that predictor in the main regression model. She argues that, by using Y, "you're stacking the deck, so to speak", ie, the imputation determines what you'll find out in the main regression model. Is there a heuristic response to that concern? (Or, if I'm wrong, please someone correct me!) Thanks, cd PS Always assuming MAR of Z (ie, missingness of Z does not depend on the unobserved Z itself). ________________________________________________________________ Constantine Daskalakis, ScD Assistant Professor, Biostatistics Section, Thomas Jefferson University, 125 S. 9th St. #402, Philadelphia, PA 19107 Tel: 215-955-5695 Fax: 215-503-3804 Email: [email protected] Webpage: http://www.kcc.tju.edu/Science/SharedFacilities/Biostatistics
