An HTML attachment was scrubbed... URL: http://lists.utsouthwestern.edu/pipermail/impute/attachments/20090704/c6d0eada/attachment.htm From allison <@t> soc.upenn.edu Sat Jul 4 10:04:19 2009 From: allison <@t> soc.upenn.edu (Paul Allison) Date: Sat Jul 4 10:04:28 2009 Subject: [Impute] weird question In-Reply-To: <[email protected]> Message-ID: <[email protected]>
I completely agree with Frank Harrell that this is not, in general, a good method. I haven't checked out all his references but, for me, the definitive refutation was Jones' 1996 paper in the Journal of the American Statistical Association. Nevertheless, I still believe that this method may be useful in two situations: 1. Data are "missing" because a variable doesn't apply or is undefined for some fraction of cases. For example, suppose you have a measure of marital happiness, dichotomized as high or low, but your sample contains some unmarried people. Then it is entirely appropriate to have a 3-category variable with values high, low, and unmarried. 2. The goal is to build a forecasting model, and it is anticipated that a substantial fraction of the new cases to be forecast will have missing data on one or more variables. Here, the goal is not to get unbiased estimates of population parameters but to minimize some function of prediction errors. A workable forecasting model must have some way of dealing with the cases that have missing data. Maybe there are better ways, but I've found almost no literature on this topic (with the exception of Warren Sarle's unpublished paper). ----------------------------------------------------------------- Paul D. Allison, Professor Department of Sociology University of Pennsylvania 581 McNeil Building 3718 Locust Walk Philadelphia, PA 19104-6299 215-898-6717 215-573-2081 (fax) http://www.ssc.upenn.edu/~allison -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Maria da Conceicao-Saraiva Sent: Saturday, July 04, 2009 9:19 AM To: [email protected] Subject: [Impute] weird question Sorry about this question, I have been discussing with some people I am working about the need of imputation with some of our data. What some of analysist are doing is just to creating a category of missing values inside some variables, they argue this is enough. It has been hard to argue with them that this is not the best way to do. Specially in our variable income, we have about 30% of missings. Does anybody know about refereces discussing this approach of just creating a category for missing values inside a variable? Maria ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~ Maria da Conceicao P. Saraiva DDS, MSc, Ph.D Departamento de Clinica Infantil e Odontologia Social e Preventiva Faculdade de Odontologia de Ribeirao Preto-Universidade de Sao Paulo Aviso: Esta mensagem destina-se exclusivamente ao destinatario, sendo confidencial. Se V. Sa. nao eh o destinatario, fique advertido de que a divulgacao, distribuicao ou copia desta mensagem eh estritamente proibida. Caso tenha recebido esta mensagem por engano, por favor avise imediatamente seu remetente atraves de resposta por e-mail. Obrigado. ________________________________________________________ Warning: This message is intended exclusively for its addressee and contain confidential information. If you are not the addressee, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication by mistake, please immediately notify the sender by reply transmission. Thank you. _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute
