[
https://issues.apache.org/jira/browse/MAHOUT-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008137#comment-13008137
]
Ted Dunning commented on MAHOUT-627:
------------------------------------
Sounds like a good project, especially if it helps the community learn about
issues for code reuse
in EM algorithms.
> Parallelization of Baum-Welch Algorithm for HMM Training
> ---------------------------------------------------------
>
> Key: MAHOUT-627
> URL: https://issues.apache.org/jira/browse/MAHOUT-627
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.4
> Reporter: Dhruv Kumar
> Priority: Minor
>
> Among the unsupervised learning methods available in the current sequential
> implementation of HMM training (MAHOUT-396), the Baum-Welch (BW) algorithm is
> an attractive candidate for a parallel, MapReduce implementation. Although
> slower than the Viterbi training algorithm, the BW is more numerically stable
> and provides guaranteed discovery of Maximum Likelihood Estimator (MLE). In
> Viterbi training, the MLE is approximated in order to reduce computation time.
> A parallel, MapReduce implementation of BW will allow scalable model learning
> over large data sets. The resulting model can be used for prediction using
> the current sequential implementation of the Viterbi decoding algorithm.
> BW is a general case of the Expectation Maximization (EM) algorithm and it is
> shown that all EM algorithms are candidates for a MapReduce implementation:
> http://www-cs.stanford.edu/~ang/papers/nips06-mapreducemulticore.pdf. Also,
> the k-means clustering is an EM algorithm and has an existing MapReduce
> implementation in Mahout which hints at the possibility of code reuse to aid
> parallel BW development. The other possibility for a parallel implementation
> would be to use the notion of matrix probability as shown by Turin:
> http://tagh.de/tom/wp-content/uploads/turin-1998.pdf.
> Although non trivial, a parallel BW implementation would greatly add to the
> existing set of Mahout's HMM functionality.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira