Check out chapter 3 of Frank Harrell's book, "Regression Modeling Strategies", for some guidance: http://books.google.com/books?id=kfHrF-bVcvQC&q=missing+data#v=snippet&q=missing%20data&f=false
The gist of it is to avoid just using the mean, avoid deletion if you can (as this could lead to biased predictions), to graphically explore the distribution of your missing variables (i.e. is there bias on the tails, for particular subjects? etc.) prior to imputation, and to analyze the sensitivity of your model to various imputations of the missing values (i.e. with a bootstrap, do the results radically change with different values?). In any case, you will have to tailor your imputation based upon the nature of the variables and the pattern of "missingness". As for software, the R package "Amelia" was built just for this purpose: http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf A brief explanation of Amelia can also be found here: http://stats.stackexchange.com/questions/47247/multiple-imputation-with-the-amelia-package -Cody On Mon, Jun 2, 2014 at 9:56 AM, ling huang <[email protected]> wrote: > Hi > > I had a question sent and was interested and hoped somebody could direct > me to the relevant article and software etc. > Suppose I have a dependent variable and a set of independent X1, X2, X3, > X4 (could be more). X1 takes on specific values e.g., 2,4, 6, 8 ppm > (concentration values). X2 takes on specific values such as temperature to > nearest 5 degrees (say 40, 45,60) etc. X3 specific and X4 specific. > > Suppose I have all readings / combinations except for example the X2 = 4 > and X2 = 50 readings (i.e., blocks of information missing). How can I best > estimate these missing values? > > Thanks > > Ling > Ling Huang > SCC > > http://huangl.webs.com >
