You need to tell us more about the objectives of the analysis and target population of the study, and how the high CRP levels relate to those. You might, for example, only be interested in the population of uninfected people (at the time the CRP measure was taken) and then want to remove the infected person's entire data from the list. or perhaps you are interested in each person's "normal" uninfected CRP level but it is missing for some because they were infected temporarily when the measure was taken; then you might discard the uninformative data on levels when infected and impute missing "normal" values (just as you would if the scale were broken when some people went in to be weighed). Of course only good if other variables are unaffected by the infection.
________________________________ From: Impute -- Imputations in Data Analysis [[email protected]] On Behalf Of Jonathan Mohr [[email protected]] Sent: Thursday, March 27, 2014 4:34 PM To: [email protected] Subject: Impute invalid data? I'm writing with a question about a small sample longitudinal study where the main outcome variable is level of C-reactive protein (CRP) measured at age 30 (which is the most recent time point for which data were collected). Typically, scores above a certain level are thrown out as invalid because the high level often indicates that the person has an infection. The folks who have been doing the main data analysis simply dropped cases with unacceptably high CRP levels. However, my sense is that a better strategy might be to simply score such participants' CRP scores as missing, and then conduct analyses with multiple imputation. I suggested this approach, and the main analyst stated that it "seems odd to me to impute CRP values for people in the first place, but to impute values for participant who actually had values that we then discarded and are now imputing seems even weirder. Maybe statistically there's no issue with doing that, but conceptually it just seems odd." I'm writing to see what folks on this list think. I'm certainly open to arguments for listwise deletion, but I'm not currently seeing a reason to do so given that all other data from the "high CRV" participants at earlier time points appear to be valid. Thanks in advance for your thoughts! Jon -- ***Please note change of email to [email protected]<mailto:[email protected]>*** Jonathan Mohr Assistant Professor Department of Psychology Biology-Psychology Building University of Maryland College Park, MD 20742-4411 Office phone: 301-405-5907 Fax: 301-314-5966 Email: [email protected]<mailto:[email protected]>
