Hi Joerg and others,

In 2007 Caren Tempelman completed a doctorate thesis on Imputation of 
Restricted Data. In the thesis she describes various ways to impute numerical 
data under logical restrictions. In particular, she examines the use of a 
truncated normal distribution (Chapter 5) and a sequential regression approach 
(Chapter 6). The thesis can be found on the web site of Statistics Netherlands, 
see http://www.cbs.nl/en-GB/menu/methoden/research/rapporten/default.htm.

At Statistics Netherlands we have also experimented a bit with some other 
approaches.

For the UN/ECE work session on statistical data editing (Vienna, April 2008) 
Natalie Shlomo (University of Southampton), Jeroen Pannekoek and I are 
preparing a short paper (or better: trying to prepare a short paper) on 
imputing data under logical restrictions where at the same time the sum (over 
all records) of the imputed values have to be equal to a known total.

Needless to say, imputation under logical restrictions is a very hard problem, 
where in the end the quality of the imputed data depends on the quality of your 
imputation model.

Best wishes,
Ton de Waal

-----Oorspronkelijk bericht-----
Van: [email protected] 
[mailto:[email protected]] Namens Drechsler J?rg
Verzonden: dinsdag 5 februari 2008 17:37
Aan: [email protected]
Onderwerp: [Impute] Imputation under logical constraints


Hi all,

I have some questions for imputation under logical constraints:

I am multiply imputing missing values for variables from an establishment 
survey using sequential regression.

Now  let's start with an easy case: I have to make sure that the condition 
y1<=y2 always holds for my imputed values. In this case, I compute y1 as the 
fraction of y2 for the observed part of the data and impute these fractions 
instead of the real values and whenever my imputed values are outside the 
bounds [0%;100%], I simply redraw the value for this observation until the 
condition is fulfilled.

Any other ideas how to do that?


It gets more difficult if I have to make sure that the condition 
y.total=y1+y2+y3 is fulfilled. If I just impute y1,y2, and y3 and then simply 
define y.total=y1+y2+y3 I expect that I will overestimate the total number. 
Another idea would be to impute all the variables independently and then 
downweight y1, y2 and y3 to make sure that the above condition is fulfilled. 
But I find neither of the two ideas to be satisfying. 

Are there other ways to do it?


Things start to get real funny, if the above conditions also have to be 
fulfilled for subpopulations. Say y.total is the total number of employees and 
y1,y2,and y3 are number of employees for different levels of qualification. 
What if the question is: How many of these employees are females?

Then I have to make sure that   y.total=y1+y2+y3 
                                y.total.f=y1.f+y2.f+y3.f
                                y.total.f<=y.total
                                y1.f<=y1
                                y2.f<=y2
                                y3.f<=y3


I am in real trouble here and any ideas or comments are highly appreciated.


Joerg

Institute for Employment Research
Nuremberg, Germany


_______________________________________________
Impute mailing list
[email protected] 
http://lists.utsouthwestern.edu/mailman/listinfo/impute

---------------
Aan de inhoud van dit e-mailbericht kunnen geen rechten worden ontleend.
De informatie verzonden in dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Het Centraal Bureau voor de Statistiek staat niet in voor de 
juiste en volledige overbrenging van de inhoud van een verzonden e-mailbericht 
noch voor tijdige ontvangst daarvan.
No rights may be derived from the contents of this e-mail message.
The information in this e-mail message is intended only for the addressee. 
Statistics Netherlands cannot vouch for the correctness and completeness of the 
contents of e-mail messages, nor for the timely receipt thereof.

Reply via email to