On Thu, Jul 17, 2003 at 01:24:41PM +1200, Schreiber, Mark wrote: > > On a more theoretical note if one calculated the probability of the > viterbi path and compared that to the forward probability would that be > a good way of infering confidence in the predictions fit to the model? > Eg If the Viterbi path prob and the Forward prob were close you could be > confident that, according to your model, other possible paths are not > that likely. Alternatively if they are not close you might conclude that > although the viterbi path is the most parsimonious there could be other > paths that are almost as likely. Or am I barking up the wrong tree here?
Yes, I think that is a meaningful thing to do. Unfortunately, if the probabilities *don't* match, it doesn't give you any clues as to where the missing probability has "gone". Is it in a set of paths which are quite similar to the optimal path, or are there completely different solutions which are almost as probable as the optimum? Ideally, you want an algorithm for sampling from the distribution of likely paths. I've never encountered one of these, but I think there may have been some work done on this. I know people who are into protein structure prediction are sometimes interested in sub-optimal sequence alignments. It's possible that the variational inference view of HMMs could help, too. An alternate approach, which BioJava *will* help you with, is to calculate both the forward and the backward DP matrices, then multiply these together and normalize by the overall backwards/forwards probability. This then tells you the probability that a given symbol in the sequence was generated by a given state in the model, considering all possible paths. Depending on exactly what you're trying to do, this might well be the confidence figure you're looking for. (This is also the expectation stage of Baum-Welch HMM training). Thomas. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l