Maria da Conceicao-Saraiva wrote: > > > > Sorry about this question, > > I have been discussing with some people I am working about the need of > imputation with some of our data. What some of analysist are doing is > just to creating a category of missing values inside some variables, > they argue this is enough. It has been hard to argue with them that this > is not the best way to do. Specially in our variable income, we have > about 30% of missings. > Does anybody know about refereces discussing this approach of just > creating a category for missing values inside a variable? > > Maria >
Maria, That approach is a disaster, failing even if missings are completely at random. There are several papers on the subject, referenced in http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf See http://biostat.mc.vanderbilt.edu/rms for more information. One easy way to see that this approach is a disaster is to realize that a new category changes the definition of the variable, and a test of association between the new variable and the dependent variable is a joint test of (effect of original variable, missingness of the variable) being associated with Y. It is amazing how many analysts create a method of analyzing data without ever even being tempted to study the performance of the method. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
