Tom: As Craig notes, the likelihood ratio test is not valid here. But the problem has nothing to do with multiple imputation. It's because SURVEYLOGISTIC does not modify the likelihood to take clustering and other design factors into account. SURVEYLOGISTIC does conventional ML estimation and then adjusts the standard errors to account for the design factors. Methods for combining log-likelihoods in the multiple imputation setting are, in fact, well established (see, e.g., my 2001 book, Missing Data).
As Craig also notes, the solution here is to do Wald tests. But it's actually very easy to do this with PROC MIANALYZE using the TEST statement. For example, to test model 4 against model 3, run model 4 and include the statement: test disease1=0, disease2=0 /mult; To test model 3 against model 2, run model 3 and include the statement test rxuse=0, healthvisits=0 / mult; The TEST statement in MIANALYZE is slightly different then the one in PROC REG and PROC LOGISTIC. Specifically, if you omit the MULT option, you only get separate tests for each hypothesis. MULT gives the joint test. ----------------------------------------------------------------- Paul D. Allison, Professor and Chair Department of Sociology University of Pennsylvania 3718 Locust Walk Philadelphia, PA 19104-6299 215-898-6712, 215-898-6717 215-573-2081 (fax) http://www.ssc.upenn.edu/~allison -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Craig Newgard Sent: Thursday, March 29, 2007 8:40 PM To: [email protected]; [email protected] Subject: Re: [Impute] How to calculate likelihood ratio test formultiply-imputed data using proc surveylogis Tom, I have struggled with a similar issue recently. It appears that you're working with complex survey data (clusters, strata, weights), which adds complexity both to the MI model and analysis. For the MI model, you should include features important to the sampling design (ie, clusters and strata), in order to minimize bias in the MI results (Raghunathan has a paper on this). For the analysis, the LR test can't be used with complex survey data. Instead, the Wald statistic can be used. I'd suggest running your MI model, then analyze each of the MI models, saving the covariance matrices and point estimates (I have used Stata for this part, as it was easier than in SAS), which can then be used to compute the joint significance of multiple terms (NORM has a fairly easy mechanism for employing this). One other option is to use the "test" command in Stata for each model (whichever joint terms you'd like to test) - this will give you some idea as to joint significance, though Stata micombine is not yet able to handle multiply imputed complex survey data. Hope this helps. Craig Craig D. Newgard, MD, MPH Assistant Professor Department of Emergency Medicine Department of Public Health and Preventive Medicine Center for Policy and Research in Emergency Medicine Oregon Health & Science University 3181 SW Sam Jackson Park Road Mail Code CR-114 Portland, Oregon 97239-3098 phone (503) 494-1668 fax (503) 494-4640 [email protected] ---------------------------------------------------------------------------- -------------------------------- Confidential: In accordance with ORS 41.675. The information contained in this EMAIL message is confidential and protected by law. The information is intended only for the person or business identified in the document. If you are not the intended recipient, a sharing, printing, storing or copying of the information will result in a violation of the law. If you have received this EMAIL by mistake, please notify the sender of this EMAIL and copy the Office of Information Privacy and Security at [email protected] . >>> "Bohman, Thomas M" <[email protected]> 03/29/07 10:08 AM >>> Greetings, I am using SAS proc surveylogistic to estimate four nested models. I've presented simplified versions of each model below Model1 Model2 Model3 Model4 Predictor Age x x x x Gender x x x x WorkingStatus x x x Region x x x HealthVisits x x RXuse x x Disease1 x Disease2 x I would like to test the joint effect of adding each additional set of variables to the predictors entered in previous models. I would normally calculate the Likelihood Ratio (LR) test by multiplying -2 by the difference in the log transformed likelihoods as shown below: LR = -2*(lnL1-lnL2) Where ln is the log transformation, L1 is likelihood for Model1 and L2 is likelihood for Model 2 with the LR value distributed as Chi-Square and having degrees of freedom equal to the difference in number of predictors between the two models. However, my question arises from using multiple imputation (proc mi) to impute missing values in 10 different imputed datasets and then using proc mianalyze to combine the results from these ten datasets and obtain the correct test statistics. I'm not sure how to deal with the LR test since there are 10 different values for the log-likelihoods for each model. One simple strategy would be to average the log-likelihoods across the 10 models and use the averaged results. However, I can't find any literature that supports this approach. I've included below the basic code that I'm using to run one of the models. **---------------------------------------------------------------------- -------**; **-- Create Multiple Imputations Model with all predictors--**; **---------------------------------------------------------------------- -------**; proc mi data=nhis.nhis_aa_recode3 seed=21355417 nimpute=10 out = nhis_aa_recode3_imp; mcmc chain=multiple displayinit initial=em(itprint); var Age Gender WorkingStatus Region HealthVisits RXuse Disease1 Disease2 ; run; ods output close; **---------------------------------------------------------------------- -------**; **-- Run Model 1 predictors--**; **---------------------------------------------------------------------- -------**; proc surveylogistic data=nhis_aa_recode3_imp ; cluster h_psu; strata h_stratum; WEIGHT h_WTFA_SA; model dependent(descending) = Age Gender / COVB expb; by _imputation_; ods output Parameterestimates=gmparms1 COVB=COVMAT1; title3 'Survey logistic results for Model 1'; run; **---------------------------------------------------------------------- -------**; **-- Combine Results for Model 1 predictors--**; **---------------------------------------------------------------------- -------**; proc mianalyze parms=gmparms1 COVB=COVMAT1 mult; modeleffects Age Gender ; title3 'Proc MIanalyze results for Model 1'; run; Any feedback on how to accomplish this test would be greatly appreciate! Any examples showing how to do so in SAS code would be doubly appreciated!! With best regards, Tom Tom Bohman, Ph.D. Research Scientist Addiction Research Institute/GCATTC Center for Social Work Research University of Texas at Austin 1 University Station R5000 Austin, TX 78712 (512) 232-0605 [email protected] _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute
