impute  

[Impute] IVEware: Imputation of categorical variables with many categories

hhafner
Mon, 21 Sep 2009 08:09:01 -0700

Hello,

I have a dataset with about 50.000 records and 30 variables. Among the 
variables are 2 categorical with many categories: Federal state (16 
categories) and branch of economic activity (80 - 100 categories). Since I 
want to produce a synthetic dataset, I double the dataset by replacing all 
values of one variable with missings.

Now to my problem with IVEware: If I want to impute for example the 
federal state, after 5-6 hours still the first iteration is running, so it 
takes too long.
My second attempt: I compute dummies for the 16 federal states. At first I 
impute the state having the most units, then the one with the second most 
units and so on. All in all this works well, but for the last state there 
are only 20-30 units remaining (original data: 358 units). I tried to swap 
the order of the smallest and the second smallest state: This didn't solve 
the problem. Now the second smallest state has by far too few units in the 
synthetic dataset. Does anyone have any further suggestions how one can 
handle categorical variables with many values in IVEware?


Kind regards
Hans-Peter Hafner

STATISTIK HESSEN

-----------
Hessisches Statistisches Landesamt
Rheinstraße 35/37
65175 Wiesbaden
Internet: http://www.statistik-hessen.de

Telefon: 0611 3802-815
Telefax: 0611 3802-890
E-Mail: hhaf...@statistik-hessen.de
_______________________________________________
Impute mailing list
Impute@lists.utsouthwestern.edu
http://lists.utsouthwestern.edu/mailman/listinfo/impute