Hello all,
I have a question regarding the use of a second round of imputation. If a group of data has had imputation conducted on it (say single-point imputation), and then more cases are added to the database (perhaps because the initial imputation was conducted for a midpoint analysis), what should be done for the additional analyses? For example, should the original imputation values remain while just estimating the missing data for the new cases, or should all missing values be estimated (including a re-estimation of the original cases)? Many thanks, Jason ************************************************************** Jason C. Cole, PhD Senior Statistician Department of Psychiatry and Biobehavioral Sciences Cousins Center for Psychoneuroimmunology 300 UCLA Medical Plaza, Room 3148 Los Angeles, CA 90095-7057 Tel: 310 267 4390 FAX: 310 794 9247 E-mail: <mailto:[email protected]> [email protected] <http://www.cousinspni.org/> http://www.cousinspni.org ************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040407/a90f89e1/attachment.htm From TRobert.Harris <@t> UTSouthwestern.edu Thu Apr 8 16:36:36 2004 From: TRobert.Harris <@t> UTSouthwestern.edu (TRobert Harris) Date: Sun Jun 26 08:25:01 2005 Subject: [Impute] Fwd: calculate risk score before or after imputation? Message-ID: <[email protected]> Submitted on behalf of Bill Howells: >>> "Howells, William" <[email protected]> 4/8/2004 4:16:54 PM >>> We have a regression model that uses a risk score that is calculated as a weighted sum of 10 prognostic variables. The weights were obtained from a regression analysis on another sample. In my sample of n=800, one or more of the 10 individual variables is missing in 40% of the patients, resulting in 40% missing risk score. I calculated the risk score on the observed data and then imputed the missing risk score using the 10 variables, in addition to other var in the analysis. Later, I found out that statisticians at our data center at another institution had imputed the 10 variables and then calculated the risk score on the imputed data. The combined inferences from the two approaches are not wildly different, eg. hazard ratio of 2.5 (p=0.005) vs 2.7 (p=0.003), but I wonder if there is any theory or simulations to help decide which approach is better? Bill Howells, MS Behavioral Medicine Center Washington University School of Medicine St Louis, MO -------------- next part -------------- A non-text attachment was scrubbed... Name: Header Type: application/octet-stream Size: 1687 bytes Desc: not available Url : http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040408/4ab4514b/Header.obj From arlinen <@t> amgen.com Thu Apr 8 21:54:40 2004 From: arlinen <@t> amgen.com (Nakanishi, Arline) Date: Sun Jun 26 08:25:01 2005 Subject: [Impute] Adding cases after imputation Message-ID: <[email protected]> Seems to me that your answer would depend on what method was used to derive the imputed value. In other words, would your imputed values change with the additional cases? For example, if you were using a bootstrapping routine to pick imputed values, the additional cases would change your pool of possible values. If so, then I would opt to reimpute for the original missing values under the presumption that the imputation would be improved. Arline Nakanishi Amgen Inc. -----Original Message----- From: [email protected] [mailto:[email protected]]on Behalf Of Cole, Jason Ph.D. Sent: Wednesday, April 07, 2004 9:32 AM To: Imputation Listserve ([email protected]) Subject: [Impute] Adding cases after imputation Hello all, I have a question regarding the use of a second round of imputation. If a group of data has had imputation conducted on it (say single-point imputation), and then more cases are added to the database (perhaps because the initial imputation was conducted for a midpoint analysis), what should be done for the additional analyses? For example, should the original imputation values remain while just estimating the missing data for the new cases, or should all missing values be estimated (including a re-estimation of the original cases)? Many thanks, Jason ************************************************************** Jason C. Cole, PhD Senior Statistician Department of Psychiatry and Biobehavioral Sciences Cousins Center for Psychoneuroimmunology 300 UCLA Medical Plaza, Room 3148 Los Angeles, CA 90095-7057 Tel: 310 267 4390 FAX: 310 794 9247 E-mail: <mailto:[email protected]> [email protected] <http://www.cousinspni.org/> http://www.cousinspni.org ************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040408/b3960819/attachment.htm From finch <@t> epi.umn.edu Fri Apr 9 17:03:51 2004 From: finch <@t> epi.umn.edu (Emily Finch) Date: Sun Jun 26 08:25:01 2005 Subject: [Impute] Imputing interactions with time using Proc Mixed Message-ID: <000001c41e7e$8ab4bdf0$09255...@epifinch> Subject: Imputing interactions with time using Proc Mixed I would like to impute some missing data on a set of repeated measures variables for the purpose of conducting a Proc Mixed analysis in SAS that includes an interaction with the "time" variable. The "time" variable is created when the data are structured so that there is one record per participant-time point. I understand that because imputation is based on linear regression, imputed values do not reflect interactions, therefore it is best to either (1) divide the data and conduct imputation separately on each category of one of the variables in the interaction, OR (2) form the product of the two variables in the interaction and include this product in the interaction. However, the "time" variable does not exist until I structure the data so that there is one record per participant-time point, yet I can't properly impute the variables when they are structured for the mixed analysis. How do I get around this problem? Thank you, Emily Finch ********************************************************************* Emily A. Finch, M.A. Research Coordinator Division of Epidemiology University of Minnesota 1100 Washington Ave. S. #201 Minneapolis, MN 55415 (612)625-5895 [email protected] ********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20040409/d62f2fca/attachment.htm From william.dupont <@t> Vanderbilt.Edu Fri Apr 9 17:23:04 2004 From: william.dupont <@t> Vanderbilt.Edu (Dupont, William) Date: Sun Jun 26 08:25:01 2005 Subject: [Impute] Strata size for hotdeck multiple imputation Message-ID: <291ef2d70eb7cb43b642323745ed5111d22...@mailbe09> Dear Impute Listers, I am currently doing an analysis with multiply imputed missing values using a hotdeck approach. We are using Rubin and Schenker's (1986) approximate Bayesian bootstrap to do the multiple imputations. In this approach, the data are divided into strata in which patients in the same strata are similar with respect to the likely value of some covariate of interest. Imputed values of this covariate are then drawn from the real values of this covariate for patients in the same strata as that of the case to be imputed. I have not been able to find any guidance in the literature as to how large the strata should be. Larger strata will produce a better distribution for the draws, while smaller strata will produce a group of patients who are more similar to the patient of interest. Do you have any recommendation as to how large the strata should be? With many thanks for any advice you can give me. Bill William D. Dupont phone: 615-322-2001 URL http://www.mc.vanderbilt.edu/prevmed/facstaff/dupont.htm
