> Hi, > > New to HMMs and BioJava, so what I'm asking for is probably a dumb question. > But I figure it better to ask it rather than sit here and be puzzled... > > >From the wiki article > http://www.biojava.org/wiki/BioJava:Tutorial:Dynamic_programming_examples > and the post > http://portal.open-bio.org/pipermail/biojava-l/2006-March/005387.html > > I get the sense that in order to create a third-order HMM, reading a > protein sequence, and emitting symbols (e.g. create an alphabet > TriGreek from "alpha","beta","delta"), you would need to create one > state for each amino acid, and associate each state with a > OrderNDistribution using a cross product alphabet as in > AlphabetManager.generateCrossProductAlphaFromName("(Protein x Protein > x TriGreek)"). > > So if you walked through a trimer AGF which emitted "alpha", you would > end in the state "F", which uses a OrderNDistribution where the first > protein (in the cross product alphabet) corresponds to the "A", the > second protein corresponds to the "G", and the last term corresponds > to "alpha." >
Your problem sounds like you are trying to estimate observations of amino acid delta based on the previous 2 observations (a second order model). Thus you would use a OrderNDistribution in which p(Delta) is conditioned on ProteinxProtein. > This seems odd, so what I don't get, is should I be mixing emissions > with previous states in the cross product alphabet to create a third > order HMM? Or is there a better way? An alternative would be to have your states emit 3 amino acids at once. This would be a normal Distribution over the alphabet proteinXproteinXprotein. Each amino acid triple would be completely independant of the previous triple. This is not the same as the OrderNAlphabet which emits single amino acids based on the previous two. > > I'm even more confused about how to define transition weights. > Each state contains a Distribution of States. These states are from the Alphabet of States that the state is connected to. The State classes implement Symbol so can belong to Alphabets. The Distribution of States gives the probability of transitioning to each State in the Alphabet of States that the origin state connects to. If your model is fully ergodic each state connects to every other state so the transition Alphabet contains every other state (in fact in fully ergodic models states can connect to themselves so the transition Alphabet would include all states including the Magic state). If you model has a more complex architecture then the transition Alphabet will include only the states you can transition to. Hope this helps. > Obviously, I'm wrong about something... How do you define > states/distributions in a third order HMM? > > Thanks > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
