[jira] Commented: (MAHOUT-627) Parallelization of Baum-Welch Algorithm for HMM Training

Ted Dunning (JIRA) Thu, 17 Mar 2011 14:14:55 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008137#comment-13008137
 ]


Ted Dunning commented on MAHOUT-627:
------------------------------------

Sounds like a good project, especially if it helps the community learn about 
issues for code reuse
in EM algorithms.

> Parallelization of Baum-Welch Algorithm for HMM Training 
> ---------------------------------------------------------
>
>                 Key: MAHOUT-627
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-627
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Dhruv Kumar
>            Priority: Minor
>
> Among the unsupervised learning methods available in the current sequential 
> implementation of HMM training (MAHOUT-396), the Baum-Welch (BW) algorithm is 
> an attractive candidate for a parallel, MapReduce implementation. Although 
> slower than the Viterbi training algorithm, the BW is more numerically stable 
> and provides guaranteed discovery of Maximum Likelihood Estimator (MLE). In 
> Viterbi training, the MLE is approximated in order to reduce computation time.
> A parallel, MapReduce implementation of BW will allow scalable model learning 
> over large data sets. The resulting model can be used for prediction using 
> the current sequential implementation of the Viterbi decoding algorithm. 
> BW is a general case of the Expectation Maximization (EM) algorithm and it is 
> shown that all EM algorithms are candidates for a MapReduce implementation: 
> http://www-cs.stanford.edu/~ang/papers/nips06-mapreducemulticore.pdf. Also, 
> the k-means clustering is an EM algorithm and has an existing MapReduce 
> implementation in Mahout which hints at the possibility of code reuse to aid 
> parallel BW development. The other possibility for a parallel implementation 
> would be to use the notion of matrix probability as shown by Turin: 
> http://tagh.de/tom/wp-content/uploads/turin-1998.pdf. 
> Although non trivial, a parallel BW implementation would greatly add to the 
> existing set of Mahout's HMM functionality. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-627) Parallelization of Baum-Welch Algorithm for HMM Training

Reply via email to