Paul.
This may be a somewhat appropriate comparison in the case of EM-based methods.
In my edit/imputation course, I begin by running an EM algorithm for a
600,000-cell contingency table in 200 iterations with epsilon 10^-12 in 45
seconds on a basic Core I7 PC. I challenge the students to write a comparable
algorithm in R or SAS that converges in less than one week. If the time to
produce the model is x and you want N copies of the output plus the additional
processing for MI, then the total time is approximately x + y where y is a
relatively quite small amount of time to finalize the MI part. Drawing N
copies from the model takes a fraction of the time to create the model.
Another timing is given at the end. The EM-based methods are compared to the
full Bayesian methods in the two JASA papers. The EM-type methods can be used
to draw multiple copies of the data if necessary. The Bayesian methods are
generally superior. The compute-intense part of the methods are the creation
of the limiting distributions (models). Drawing extra copies from the models
can be very fast.
Regards. Bill
There have been recent major advances in edit/imputation that compare
favorably with our edit methods (Winkler 1995, 1997ab, 2003, 2008, 2010). The
methods had previously been powerful and have been the fastest in the world.
Kim, Kim, H., Cox, L. H., Karr, A. F., Reiter, J. P., Wang, Q. (2014),
Simultaneous Edit-Imputation for Continuous Microdata,, JASA, 110, 987-999..
Manrique-Vallier, D. and Reiter, J. (2017), Bayesian Simultaneous Edit and
Imputation for Multivariate Categorical Data. Journal of the American
Statistical Association, online version available September 16, 2016,
[cid:f69a8ac3-cdff-4815-a081-0f1eef0231b5]ManriqueVallierReiterBayesian
Simultaneous Edit and Imputation for Multivariate Categorical
DataJASA16.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__collab.ecm.census.gov_dir_adrm_RMStratPlan2017_SiteAssets_Lists_Topic-25204_EditForm_ManriqueVallierReiterBayesian-2520Simultaneous-2520Edit-2520and-2520Imputation-2520for-2520Multivariate-2520Categorical-2520DataJASA16.pdf&d=DwIFJg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=N9mDDDuK1isnKK-Q36bwNuZl066Rn4cNQtKxtVKMnWBnZ5yXlXHty3gF6wWXsdE6&m=So_ZZMrM_GxwnPrlnqQDOym4WxRUkMZZ1Jjz4vx3tcU&s=GNoBy3919mSAQBpOBz4Cfg8TH-2GgrMA0GKaEWnpX5M&e=
>
The 2017 paper improves on the preservation of joint distributions over our
methods but needs some additional enhancements. Their methods are 2000-8000
times as slow as our methods. Using the DISCRETE system (Winkler 1997, 2003,
2008, 2010) on a server with 20 cpus, we can process the Decennial short-form
in less than twelve hours.
________________________________
From: Impute -- Imputations in Data Analysis
<[email protected]> on behalf of Paul von Hippel
<[email protected]>
Sent: Monday, May 8, 2017 12:41 PM
To: [email protected]
Subject: Run time
Does anyone know of work on the run time of different MI algorithm? Every MI
user knows that some MI software can be slow in large datasets, but it's not
something I've seen discussed in the MI literature.
Best wishes,
Paul von Hippel
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX 78712
(512) 537-8112