I am soliciting any suggestions for published references or insight regarding the use of multiple imputation in complex probability samples (i.e. weighted databases). The difficulty I'm encountering is how to build an imputation model that appropriately reflects the complex sample design, while allowing appropriate calculations for the variance. Ideally, I'd like to use SAS to perform MI (proc MI), then use SUDAAN to perform the analysis (accounting for sample design), then combine the results with SAS (proc MIANALYZE). Many of the national complex samples have used hot deck imputation or simple imputation with a regression model to impute values, but I have yet to find a detailed description of MI in a complex sample.
These are the ideas I've come up with so far: 1) modified bootstrapping technique: expand sample to full weighted size, then draw repeated simple random samples (equal in size to the original non-weighted sample), perform MI in standard fashion on each of these samples until all original observations have no missing values. Benefit-still allows analysis with SUDAAN; potential problem: inappropriate variance calculation in MI process? 2) separate sample by primary sampling units (PSUs) and strata within each PSU, then perform MI individually within each of these separated strata. benefit-accounts for majority of sample design; prob-may be limited by sample size in strata and fails to utilize all data in database for the MI process. Any thoughts? Thanks. Craig Craig D. Newgard, MD, MPH Assistant Professor Department of Emergency Medicine Department of Public Health & Preventative Medicine Oregon Health & Science University 3181 Sam Jackson Park Road Mail Code CR-114 Portland, OR 97201-3098 (503) 494-1668 (Office) (503) 494-4640 (Fax) [email protected]
