[jira] [Commented] (MAHOUT-652) [GSoC Proposal] Parallel Viterbi algorithm for HMM

Grant Ingersoll (JIRA) Sat, 25 Jun 2011 04:59:14 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054870#comment-13054870
 ]


Grant Ingersoll commented on MAHOUT-652:
----------------------------------------

Awesome!  How goes the testing?

> [GSoC Proposal] Parallel Viterbi algorithm for HMM
> --------------------------------------------------
>
>                 Key: MAHOUT-652
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-652
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.5
>            Reporter: Sergey Bartunov
>            Assignee: Isabel Drost
>              Labels: classification, gsoc,, gsoc11, hmm, mahout-gsoc-11, 
> sequence-learning
>             Fix For: 0.6
>
>
> Proposal Title: Parallel Viterbi algorithm for HMM
> Student Name: Sergey Bartunov
> Student E-mail: [email protected]
>  
> Organization/Project: Apache Mahout
> Assigned Mentor:
> Proposal Abstract:
> The Viterbi Algorithm is an evaluating algorithm for Hidden Markov Model [1]. 
> It estimates the most likely sequence of hidden states called "Viterbi path" 
> from given sequence of observed states. Viterbi algorithm is also used in 
> Viterbi training which is less numerical stable than Baum-Welch algorithm but 
> faster and can be adjusted to have near the same accuracy [2]. Hidden Markov 
> Model is widely used for speech and gesture recognition, part-of-speech 
> tagging, bioinformatics etc. At the moment Apache Mahout contains only 
> sequential HMM functionality, and this project is intended to extend it by 
> implementing Map-Reduce version of Viterbi algorithm which would make Mahout 
> able to evaluate HMM on big amounts of data in parallel mode. 
> Detailed Description:
> The Viterbi algorithms is a quite common dynamic programming algorithm, it's 
> well described in many sources such as [3], [4], [5]. As being popular and 
> needed, Viterbi algorithm already have parallel versions which became a 
> subject of several studies, for example, this paper about implementation for 
> GPU [6]. Some of these papers may contain useful ideas for map-reduce 
> implementation.
> There are several possible strategies for parallelization (which one is
> better is an open question for the project) I discovered and posted to the 
> mailing list.
> See [this 
> thread|http://mail-archives.apache.org/mod_mbox/mahout-dev/201104.mbox/%[email protected]%3E]
> The most reasonable one is:
> Assume we have O - set of oberved state sequences (that's our input data), 
> $O_i$ is i-th sequence with length 
> len($O_i$), K is number of hidden states in our model. {$O_{i, t}$} is the 
> observed state of i-th sequence at the moment of time t. Let the N be the 
> maximum of len($O_i$).
> Let's write all the seqences one above other. We will get the matrix which 
> rows are the input seqeunces. Then take the first L columns of this matrix 
> and name them the "first chunk", then next L columns and so on. Thus we 
> divide our input into the chunks of size L (less or equal to L). L is the 
> parameter of the algorithm. So the n-th chunk contain subsequences with $t$ 
> in the iterval [L*n, L*(n+1)). 
> The key idea is to process each chunk by standard sequential Viterbi 
> algorithm (the existing Mahout code could be reused here) in parallel mode 
> one by one using results of previous chunk processing.
> It will require about O((M/P) * K^2) time where M is sum of all sequence 
> lengths if most of input sequences have near the same length (the good case). 
> P here is the number of "cores" / computational nodes. This time complexity 
> means that such strategy scales well. 
> Here is Map-Reduce scheme for the good case:
> * Map emits (subsequence, probabilites, paths) vector for each subsequence in 
> the chunk, where probabilities is the initial probabilites in case of the 
> first chunk and optimal-path probabilities in all other cases, paths means 
> optimal paths to the first observations of each subsequence. 
> * Reduce function just performs sequential Viterbi algorithm on these data 
> and returns (probabilities, paths).
> * Driver class runs the Map then Reduce for the first chunk, then Map and 
> Reduce for the second, and so on. Then provide the results (probably using 
> another map-reduce combination).
> Probably, it's possible to use lazy viterbi [7,8] instead of the standard 
> Viterbi for chunk processing.
> Let's focus now on all other cases when first T chunks contain the 
> subsequences with near the same length and all other N-T chunks do not. Well, 
> then they could be processed using the next technique (which is strategy 
> number 2 in [the 
> list|http://mail-archives.apache.org/mod_mbox/mahout-dev/201104.mbox/%[email protected]%3E]):
> At first, denote $V_{t,k}$ as the probabilty that optimal hidden state path 
> goes through hidden state $k$ at the moment $t$. We need such probabilities 
> to select the most likely (or optimal) path so we need to compute $V_{t,k}$ 
> for every $t$ and $k$ (that is the part of standard Viterbi). I omit here the 
> $i$ index of the sequence, because each seqence is the separate task.
> 1) process each chunk separately. $t$ lie in the [L*n, L*(n+1)) interval, 
> where n is the number of chunk and to compute  Actually to do this, we need 
> to compute $V_{t,k}$ we need to know $V_{L*n-1,k1}$ for each $k1$. But this 
> value belong to the chunk the previous number (n-1) which is processed 
> separately. We need to obtain the max and arg-max of $V_{L*n-1, k1}$ for each 
> k1. 
> 2) Since we don't know the max's and arg-max's let's just assume that optimal 
> path goes through the k1=1 then through k2=2 and so on up to the K, and 
> $V_{L*n-1, k1}$ = 1 (or 0 if we use log(prob) adding instead of prob 
> multiplying) and argmax is k1=1, 2 ... K. Later we may multiply the 
> $V_{L*(n+1)-1, k1}$ by the precise probability max{ $V_{L*n-1, k1}$ } (or add 
> the log of it). Thus we get  K^2 possibly optimal paths instead of K.
> 3) After we process all the chunks we start "connecting" phase. 
> The first of these N-T chunks could be processed by using usual Viterbi 
> processing since we know the results for the previous "good" chunk. All 
> others need to be connected each to other starting from connecting second 
> this first then 3th with 4th and so on. During connection process we throw 
> away all non-optimal paths (know we know which are optimal and which are not) 
> and leave just K possibly optimal paths.
> To handle such difficult cases the Driver class should know the sequence 
> lengths to recognize which Map-Reduce combination it should use. The 
> resulting time complexity for difficult case is 
> O((T*L / N) / P) * K^2 + (((T-N) * L / N)) / P * K^3). The first term of this 
> sum is time for computing the first T chunks using simple case approach, and 
> the second term is time for computing the rest using difficult case approach. 
> When T -> N overall time tend to O((M/P) * K^2). T * L / N is about total 
> length of subsequences in the first T chunks and (((T-N) * L / N)) is total 
> length of the rest.
> The only problem not discussed here is how to deal with obtained most likely 
> paths. Path length is equal to length of a sequence, so it's nood a good idea 
> to emit optimal path in Map stage. They could be stored separately in the 
> files or HBase since they are required only in getting the results to the 
> user.
> This implementation should work (or be compatible) with 
> org.apache.mahout.classifier.sequencelearning.hmm.HmmModel
> As you see, Viterbi algorithm use very common dynamic approach. It computes 
> the next computation layer by combining values the previous, that's all we 
> need to know how to implement it in the Map-Reduce and it shares the 
> mentioned difficulties with all other such dynamic algorithms, so the 
> implentation may be highly resuable.
> Timeline:
> 1. 1 week. Discover Mahout internals, sequential HMM code and code best 
> practices. Think on problem with path storing. I'll work on this stage 
> actually before GSoC starts but I leave here 1 week just to be sure.
> 2. 5 days. Write the chunk division routines. 
> 3. 5 days. Write the SimpleMapper class for simple good case of data.
> 4. 1 week. Write the SimpleReducer class for simple good case.
> 5. 1 week. Write the Driver class for simple good case which will use 
> SimpleMapper and SimpleReducer. 
> 6. 1 week. Collect debug dataset, write tests for the code and find possible 
> problems, fix them.
> 7. 10 days. Write the HardReducer class for difficult bad case and rewrite 
> Driver class to handle such cases. Perform tests, ensure that everything 
> works.
> 7. 10 days. Try to get a large dataset to test performance, scalability etc, 
> write and perform such a test (here a community help might be needed).
> If there are any problems discuss and analyze them, then fix.
> 8. 1 week. Try to use some optimized Viterbi implentation (i.e. Lazy Viterbi) 
> in chunk processing phase, measure speed improvement especially for difficult 
> cases.
> 9. 1 week. Write documentation for all classes, description of algorithm in
> wiki and complete example of usage.
> Additional Information:
> The project relates to [this 
> proposal|https://issues.apache.org/jira/browse/MAHOUT-627] which is about 
> implementing parallel Baum-Welch traning algorithm for HMM.
> My motivation for this project is that I need such a functionality as a 
> regular Mahout user since I face HMM often in my research work. Also I'm very 
> interested in what kinds or classes of algorithms could be efficiently 
> implemented in Map-Reduce, I also think that this is important question for 
> the community and such an experience is significant.
> The important detail is that I have exams in the university until 16th June. 
> After that I will be ready to work on the project.
> References:
> [1] Lawrence R. Rabiner (February 1989). ["A tutorial on Hidden Markov Models 
> and selected applications in speech 
> recognition"|http://www.cs.cornell.edu/courses/cs481/2004fa/rabiner.pdf]. 
> Proceedings of the IEEE 77 (2): 257-286. doi:10.1109/5.18626. 
> [2] [Adjusted Viterbi Training. A proof of 
> concept.|http://personal.rhul.ac.uk/utah/113/VA/AVTPC.pdf]
> [3] 
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.2081&rep=rep1&type=pdf
> [4] http://www.cambridge.org/resources/0521882672/7934_kaeslin_dynpro_new.pdf
> [5] http://www.kanungo.com/software/hmmtut.pdf
> [6] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5470903
> [7] http://www.scribd.com/doc/48568001/Lazy-Viterbi-Slides
> [8] http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1040367

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-652) [GSoC Proposal] Parallel Viterbi algorithm for HMM

Reply via email to