Check out chapter 3 of Frank Harrell's book, "Regression Modeling
Strategies", for some guidance:
http://books.google.com/books?id=kfHrF-bVcvQC&q=missing+data#v=snippet&q=missing%20data&f=false

The gist of it is to avoid just using the mean, avoid deletion if you can
(as this could lead to biased predictions), to graphically explore the
distribution of your missing variables (i.e. is there bias on the tails,
for particular subjects? etc.) prior to imputation, and to analyze the
sensitivity of your model to various imputations of the missing values
(i.e. with a bootstrap, do the results radically change with different
values?).

In any case, you will have to tailor your imputation based upon the nature
of the variables and the pattern of "missingness".

As for software, the R package "Amelia" was built just for this purpose:
http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf

A brief explanation of Amelia can also be found here:
http://stats.stackexchange.com/questions/47247/multiple-imputation-with-the-amelia-package

-Cody


On Mon, Jun 2, 2014 at 9:56 AM, ling huang <[email protected]> wrote:

> Hi
>
> I had a question sent and was interested and hoped somebody could direct
> me to the relevant article and software etc.
> Suppose I have a dependent variable and a set of independent X1, X2, X3,
> X4 (could be more). X1 takes on specific values e.g., 2,4, 6, 8 ppm
> (concentration values). X2 takes on specific values such as temperature to
> nearest 5 degrees (say 40, 45,60) etc. X3 specific and X4 specific.
>
> Suppose I have all readings / combinations except for example the X2 = 4
> and X2 = 50 readings (i.e., blocks of information missing). How can I best
> estimate these missing values?
>
> Thanks
>
> Ling
> Ling Huang
> SCC
>
> http://huangl.webs.com
>

Reply via email to