Re: Run time

William E Winkler (CENSUS/CSRM FED) Mon, 08 May 2017 11:25:09 -0700

Paul.

This may be a somewhat appropriate comparison in the case of EM-based methods.  
In my edit/imputation course, I begin by running an EM algorithm for a 
600,000-cell contingency table in 200 iterations with epsilon 10^-12 in 45 
seconds on a basic Core I7 PC.  I challenge the students to write a comparable 
algorithm in R or SAS that converges in less than one week.  If the time to 
produce the model is x and you want N copies of the output plus the additional 
processing for MI, then the total time is approximately x + y where y is a 
relatively quite small amount of time to finalize the MI part.  Drawing N 
copies from the model takes a fraction of the time to create the model.



Another timing is given at the end.  The EM-based methods are compared to the 
full Bayesian methods in the two JASA papers.  The EM-type methods can be used 
to draw multiple copies of the data if necessary.  The Bayesian methods are 
generally superior.  The compute-intense part of the methods are the creation 
of the limiting distributions (models).  Drawing extra copies from the models 
can be very fast.

Regards.  Bill


   There have been recent major advances in edit/imputation that compare 
favorably with our edit methods (Winkler 1995, 1997ab, 2003, 2008, 2010).  The 
methods had previously been powerful and have been the fastest in the world.

　

    Kim, Kim, H., Cox, L. H., Karr, A. F., Reiter, J. P., Wang, Q. (2014), 
Simultaneous Edit-Imputation for Continuous Microdata,, JASA, 110, 987-999..

     Manrique-Vallier, D. and Reiter, J. (2017), Bayesian Simultaneous Edit and 
Imputation for Multivariate Categorical Data. Journal of the American 
Statistical Association, online version available September 16, 2016,

　[cid:f69a8ac3-cdff-4815-a081-0f1eef0231b5]ManriqueVallierReiterBayesian 
Simultaneous Edit and Imputation for Multivariate Categorical 
DataJASA16.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__collab.ecm.census.gov_dir_adrm_RMStratPlan2017_SiteAssets_Lists_Topic-25204_EditForm_ManriqueVallierReiterBayesian-2520Simultaneous-2520Edit-2520and-2520Imputation-2520for-2520Multivariate-2520Categorical-2520DataJASA16.pdf&d=DwIFJg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=N9mDDDuK1isnKK-Q36bwNuZl066Rn4cNQtKxtVKMnWBnZ5yXlXHty3gF6wWXsdE6&m=So_ZZMrM_GxwnPrlnqQDOym4WxRUkMZZ1Jjz4vx3tcU&s=GNoBy3919mSAQBpOBz4Cfg8TH-2GgrMA0GKaEWnpX5M&e=
 >




  The 2017 paper improves on the preservation of joint distributions over our 
methods but needs some additional enhancements. Their methods are 2000-8000 
times as slow as our methods. Using the DISCRETE system (Winkler 1997, 2003, 
2008, 2010) on a server with 20 cpus, we can process the Decennial short-form 
in less than twelve hours.



________________________________
From: Impute -- Imputations in Data Analysis 
<[email protected]> on behalf of Paul von Hippel 
<[email protected]>
Sent: Monday, May 8, 2017 12:41 PM
To: [email protected]
Subject: Run time

Does anyone know of work on the run time of different MI algorithm? Every MI 
user knows that some MI software can be slow in large datasets, but it's not 
something I've seen discussed in the MI literature.

Best wishes,
Paul von Hippel
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712
(512) 537-8112

Re: Run time

Reply via email to