I have built an Profile HMM. I hand trained it (setting the emission and transition distributions by hand) and was able to generate nice viterbi scores of fasta sequences. However, when I tried to perform Expectation Maximization using the BaumWelchTrainer and a training set, things did not go well at all. After the iterations are done, all of the emission and transition distributions of the now trained model are all full of NaN's!!! (Needless to say, viterbi scoring is now impossible. Any attempt to do so generates a NullPointerException on line 650 of SingleDP.java in the SingleDP.viterbi() method.)

I looked into the mail archives and found that Fabian Schreiber had the exact same problem when he wrote a BaumWelchTrainer program exactly like the one from Biojava in Anger: "How do I make a ProfileHMM?". His message is from March 25th of this year (with no replies).

I then decided to download the BioJava 1.4 sources and found 2 additional (dp) demos that use the BaumWelchTrainer:
   demos/dp/PatternFinder.java
   demos/dp/SearchProfile.java

I compiled and ran both of these demos and found very discouraging results. The iteration scores quickly go to NaN, no matter what sequences I train on (including the demos/dp/fake.fasta file).

Is there something that I am missing here? Is the BaumWelchTrainer broken? Why are all the emission and transition distributions now full of all NaN's after training?

Any insight or investigation here would be greatly appreciated.

Thanks,
Todd Riley

I am re-posting Fabian Schreiber's code because it is shorter than mine......

//Create Markov Modell - The method createCasino generates an Alphabet and sets //the probabilities for the transitions and emissions
MarkovModel casino = createCasino();

DP dp=DPFactory.DEFAULT.createDP(casino);


BaumWelchTrainer bwtrainer = new BaumWelchTrainer(dp);


SequenceDB seqDB = new HashSequenceDB("hashdb");
// here the DB is filled with the sequences --> this works

//Set the stopper
 StoppingCriteria stopper= new StoppingCriteria()
            {public boolean isTrainingComplete(TrainingAlgorithm ta)
            {return (ta.getCycle() > 10);}};
//Train the modell
bwtrainer.train(seqDB, 1.0, stopper);


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to