On 08-Apr-09 22:10:26, Ravi Varadhan wrote: > EM algorithm is a better approach for maximum likelihood estimation > of finite-mixture models than direct maximization of the mixture > log-likelihood. Due to its ascent properties, it is guaranteed to > converge to a local maximum. By theoretical construction, it also > automatically ensures that all the parameter constraints are satisfied. > > The problem with mixture estimation, however, is that the local maximum > may not be a good solution. Even global maximum may not be a good > solution. For example, it is well known that in a Gaussian mixture, > you can get a log-likelihood of +infinity by letting mu = one of the > data points, and sigma = 0 for one of the mixture components. You can > detect trouble in MLE if you observe one of the following: (1) a > degenerate solution, i.e. one of the mixing proportions is close to 0, > (2) one of the sigmas is too small. > > EM convergence is sensitive to starting values. Although, there are > several starting strategies (see MacLachlan and Basford's book on > Finite Mixtures), there is no universally sound strategy for ensuring a > good estimate (e.g. global maximum, when it makes sense). It is always > a good practice to run EM with multiple starting values. Be warned > that EM convergence can be excruciatingly slow. Acceleration methods > can be of help in large simulation studies or for bootstrapping. > > Best, > Ravi.
Well put! I've done a fair bit of mixture-fitting in my time, and one approach which can be worth trying (and for which there is often a reasonably objective justification) is to do a series of fits (assuming a given number of components, e.g. 2 for the present purpose) with the sigma's in a constant ratio in each one: sigma2 = lambda*sigma1 Then you won't get those singularities where it tries to climb up a single data-point. If you do this for a range of values of lambda, and keep track of the log-likelihood, then you get in effect a profile likelihood for lambda. Once you see what that looks like, you can then set about penalising ranges you don't want to see. The reason I say "there is often a reasonably objective justification" is that often you can be pretty sure, if there is a mixture present, that lambda does not go outside say 1/10 - 10 (or whatever), since you expect that the latent groups are fairly similar, and unlikely to have markedly different sigma's. As to acceleration: agreed, EM can be slow. Aitken acceleration can be dramatically faster. Several outlines of the Aitken procedure can be found by googling on "aitken acceleration". I recently wrote a short note, describing its general principle and outlining its application to the EM algorithm, using the Probit model as illustration (with R code). For fitting the location parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM took 3. For fitting the location and scale paramaters, Raw EM took 108, and Aitken took 4. If anyone would like a copy (PS or PDF) of this, drop me a line. Or is there some "repository" for R-help (like the "Files" area in Google Groups) where one can place files for others to download? [And, if not, do people think it might be a good idea?] Best wishes tgo all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 09-Apr-09 Time: 00:33:02 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.