Date: 5 JUN 2002 22:36:06 -0700
From: Sapsi <[EMAIL PROTECTED]>

> We are doing a regression, however there a plenty of missing values,
> which cannot be dropped, so somehow have to be imputed.
> Ther are some options,
> a) Replace the problem regressor by a dummy variable
> b) Replace the missing by median (it is demographic data)
> c) Replace the missing data by the trimmed/winsorized mean.
> d) Replace by mode
> 
> Which is better altenative??

It depends on the joint distribution function h(x,y).

You can get some insight from the (x,y) scatter plot and normalized 
marginal histogram f(xi) (i.e., 1 = SUM(1=1,N){f(xi)} ).

If the marginal is multimodal or weird in some other way, one of 
the above choices probably won't work. 

If the missing values appear to be uncorrelated with y:

1. Reduce the number of bins in the histogram until you obtain m which 
   you consider minimally representative. 
2. Use those representative values to create m regressions.
3. Average over the m sets of regression coefficients with weights f(xi) to 
   obtathe final result.

I don't know if this will work if the missing values are correlated with y.

Greg

Hope this helps.

Gregory E. Heath     [EMAIL PROTECTED]      The views expressed here are
M.I.T. Lincoln Lab   (781) 981-2815        not necessarily shared by
Lexington, MA        (781) 981-0908(FAX)   M.I.T./LL or its sponsors
02420-9185, USA
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to