Hello, After successfully implementing some TFBS search models using the ProfileHMM and DP classes, I am ready to attempt some fancier stuff that is going to require some serious coding. Before I begin, I thought that I might field some questions to the BioJava users/programmers that have some experience and/or interest in the BioJava HMM classes. I want to be sure to implement features in a fashion that will maximize usability in the simplest way....
Questions: 1. Many of the TFBS sites that I am modeling are palindromic or repetitive. I wish to associate transition and emission distributions (as prior knowledge) during training in order to enforce a palindromic and/or repetitive pattern and thus also greatly reduce the parameter space. Example: A p53 TFBS is palindromic and repetitive. A 20 column Profile HMM can be greatly reduced to an HMM with a the match-state topology of 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1), where C() means DNA complement. Notice that with this model, I now have only 5 match-state emissions as opposed to 20 to train. (C(n) is a complement view over distribution n). There are also far fewer transition distributions to train if I impose that the transitions from a->b are the same as b->a or C(b)->C(a), but in the opposite direction. I wish to implement this in a fashion that does not require any changes to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP class. I have already started writing classes that provide a view (or complement view) over an existing distribution. My plan is to use these views as a means to correlate emission and transition distributions from and between different columns in the Profile HMM. Has anyone ever tried this or thought of trying this? Any ideas about how to implement this could be very useful. 2. I wish to use more complicated background models than just a 0-th order background distribution. I would like to use a Dirichlet mixture and/or higher order Markov models. Has anyone looked into this? Any ideas as to how to implement this in the current release? -Todd _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
