Craig......interesting comment as I recall from the Cohen and Cohen regression 
text, that if data on the outcome variable is missing, one is generally out of 
luck, as any type of imputation/missing data technique is inappropriate.....in 
fact, this is the direct quote: "Because Y represents the outcome or effect of 
the IVs, when the Y value for a subject is not known, there is little that can 
be done in [regression] but drop that subject.  One might regret the apparent 
loss of information in the IVs when the subject is dropped, but in the fixed 
model situation, there is no information lost, because the investigator has 
selected the combination of IV values to be studied" (pp. 275-276)........do 
you agree with this?....are there no missing data options when the missingness 
is associated with the outcome variable?.....thanks...dale

Dale N. Glaser, Ph.D.
Pacific Science & Engineering Group
6310 Greenwich Drive; Suite 200
San Diego, CA 92122 
Phone: (858) 535-1661 Fax: (858) 535-1665
http://www.pacific-science.com

-----Original Message-----
From: Craig D. Newgard [mailto:[email protected]]
Sent: Thursday, May 23, 2002 1:43 PM
To: [email protected]; [email protected]
Subject: IMPUTE: Re: imputing covariates

Constantine,
        Generally, I agree with your colleague.  The danger in using an outcome
variable in the imputation process is in creating associations between
predictors and the outcome that would not be present otherwise.  If this
occurs (essentially creating confounding variables), your result could be
biased.  It is difficult to tell whether or not this occurs when the outcome
is included in the imputation process, as differences in results between the
non-imputed data and the imputed data could reflect a correction of the bias
in using data restricted to non-missing values (the non-imputed dataset) or
bias created from the imputation process, or both.  The simplest way to deal
with this is to leave out the outcome variable from the imputation process,
as a benefit in precision may be offset by a loss of validity.

Craig

Craig D. Newgard, MD, MPH
Research Fellow
Department of Emergency Medicine
Harbor-UCLA Medical Center
1000 West Carson Street, Box 21
Torrance, CA 90509
(310)222-3666 (Office)
(310)782-1763 (Fax)
[email protected]


-----Original Message-----
From: [email protected] [mailto:[email protected]]on
Behalf Of Constantine Daskalakis
Sent: Wednesday, May 22, 2002 1:18 PM
To: [email protected]
Cc: Constantine Daskalakis
Subject: IMPUTE: imputing covariates


Hi.

I have a regression of Y on a bunch of Xs (always observed) and on Z
(sometimes missing).

The X's will be used to impute Z. But should Y also be used in imputing Z?

My reading of the literature suggests that's not a problem and can often be
a good thing in terms of gaining precision. A colleague argues that using
the outcome to impute the predictor, will bias the estimated effect of that
predictor in the main regression model. She argues that, by using Y,
"you're stacking the deck, so to speak", ie, the imputation determines what
you'll find out in the main regression model.

Is there a heuristic response to that concern?
(Or, if I'm wrong, please someone correct me!)

Thanks,
cd

PS  Always assuming MAR of Z (ie, missingness of Z does not depend on the
unobserved Z itself).



________________________________________________________________

Constantine Daskalakis, ScD
Assistant Professor,
Biostatistics Section, Thomas Jefferson University,
125 S. 9th St. #402, Philadelphia, PA 19107
    Tel: 215-955-5695
    Fax: 215-503-3804
    Email: [email protected]
    Webpage: http://www.kcc.tju.edu/Science/SharedFacilities/Biostatistics



Reply via email to