> From: Drechsler J?rg <[email protected]>
> Subject: [Impute] Imputation under logical constraints
> To: <[email protected]>

> Now  let's start with an easy case: I have to make sure that 
> the condition y1<=y2 always holds for my imputed values. In 
> this case, I compute y1 as the fraction of y2 for the 
> observed part of the data and impute these fractions instead 
> of the real values and whenever my imputed values are outside 
> the bounds [0%;100%], I simply redraw the value for this 
> observation until the condition is fulfilled.
> 
> Any other ideas how to do that?

If 0<Y1<Y2 (strictly) in the dataset, you could model log(Y2) and
logit(Y1/Y2) as continuous normal variables.  This begins to break down
if sometimes Y1=0 or Y1=Y2, although you could kludge your way through
by recoding Y1/Y2 as epsilon and 1-epsilon respectively when it is
actually 0 or 1.

> It gets more difficult if I have to make sure that the 
> condition y.total=y1+y2+y3 is fulfilled. If I just impute 
> y1,y2, and y3 and then simply define y.total=y1+y2+y3 I 
> expect that I will overestimate the total number. Another 
> idea would be to impute all the variables independently and 
> then downweight y1, y2 and y3 to make sure that the above 
> condition is fulfilled. But I find neither of the two ideas 
> to be satisfying. 
> 
> Are there other ways to do it?

Similar to above, you could transform to y.total, log(y2/y1) and
log(y3/y1).  (This appears to give a special role to y1 but actually the
treatment is symmetrical.)  Same caveats as above.

> Things start to get real funny, if the above conditions also 
> have to be fulfilled for subpopulations. Say y.total is the 
> total number of employees and y1,y2,and y3 are number of 
> employees for different levels of qualification. What if the 
> question is: How many of these employees are females?
> 
> Then I have to make sure that         y.total=y1+y2+y3 
>                               y.total.f=y1.f+y2.f+y3.f
>                               y.total.f<=y.total
>                               y1.f<=y1
>                               y2.f<=y2
>                               y3.f<=y3

You could impute men and women separately.  Alternatively you could
model a set of variables like y.total, logit(y.men/y.total), and then
proportions within each sex as above.

These sorts of transformation are just one set of possibilities that
should be easy to implement but of course it all depends on the
distributions of the data and the nature of the relationships --
choosing a good imputation model is a data analysis problem and no one
size fits all!

        Alan Zaslavsky
        Harvard Medical School

Reply via email to