Hi - There are two things going on here. The first is that I beleive the profile model presented in biojava doesn't loop back on itself. I could be wrong I need to check the code. If this is indeed the case then the model will not be capable of finding more than one match in a sequence. This can be easily modified by changing the existing ProfileHMM code in a custom class or getting a reference to the MarkovModel and changing it's possible transitions.
The other issue is the type of scoring used. ScoreType.Probability calculates the Viterbi path based on the transitions of the model and the emission probabilities of the states. ScoreType.NullModel uses the 'null model' which in your case will be a uniform distribution (essentially random) which will be meaningless, hence the strange result. The null model would be more meaningful if you wanted to model some biased background. ScoreType.ODDs is the log odds of the trained model and the null model. It is most useful when the null model is not uniform, eg where you want to distinguish a signal from biased background. It is most often used for proteins where the background amino acid distribution is anything but uniform. Hope this helps, - Mark On 5/22/07, Evert-Jan Blom <[EMAIL PROTECTED]> wrote: > Dear all, > > Using a page from the CookBook > http://www.biojava.org/wiki/BioJava:CookBook:DP:HMM we implemented a > profile HMM > in our application to detect regulatory motif instances. To test, we > created a model based on 10 identical sequences > (the test sequence was: TGCTGCTGCGGGCCC): > The model is subsequently trained using a BaumWelchTrainer and decoded > using the ScoreType.ODDS, ScoreType.Probability and ScoreType.NullModel > > The sequence we use for testing contains 2 motifs, a perfect motif and a > motif with one mismatch:. > > AAAATGCTGCTGCGGGCCCAAAAATGCTGCGGCGGGCCCAAA > > The results of the original HMMER package tell me that there are 2 > instances of the motif present in the test string whereas the biojava > package yields very strange results: > > results using the ScoreType.ODDS, only the second motif is detected: > > {AAAATGCTGCTGCGGGCCCAAAAATGCTGCGGCGGGCCCAAA} > Log Odds = 7.65779871993799 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > i-0 > m-1 > m-2 > m-3 > m-4 > m-5 > m-6 > d-7 > m-8 > m-9 > m-10 > m-11 > m-12 > m-13 > d-14 > d-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > > Now the second scorer, only the first motif is detected: > > Prob = -95.9806747848816 > i-0 > i-0 > i-0 > i-0 > m-1 > m-2 > m-3 > m-4 > m-5 > m-6 > m-7 > m-8 > m-9 > m-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > i-10 > m-11 > i-11 > m-12 > i-12 > i-12 > i-12 > m-13 > m-14 > m-15 > i-15 > i-15 > i-15 > i-15 > i-15 > > Now the null model which seems to make no sense at all: > Null = -94.11166855273558 > m-1 > m-2 > m-3 > m-4 > m-5 > m-6 > m-7 > m-8 > m-9 > m-10 > m-11 > m-12 > m-13 > m-14 > m-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > i-15 > > Is there an option to detect the second motif in the same run just like > the original HMMER? Or am I missing some > option that is not described in the tutorial. > > Thanks in advance > > E.J.Blom > > > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
