When you call the train() method of the BaumWelchTrainer you supply it with a SequenceDB. The sequences from this DB are used to optimize the weights of the model.
However, I have a bad feeling that when you train your model with the BaumWelchTrainer your previously set counts will be ignored and overwritten. You could check by looking into AbstractModelTrainer.train() (which is what the BaumWelchTrainer extends). You could also run some tests to see if using a pre-trained model makes any difference to the final outcome. Does anyone more expert than me on the DP package (ie most people) know if the counts are overwritten? - Mark [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 03/12/2004 01:30 PM To: [EMAIL PROTECTED] cc: Biojava Mailing List <[EMAIL PROTECTED]> Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] Sorry for the previous error. ---------------------------- Original Message ---------------------------- Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining From: [EMAIL PROTECTED] Date: Fri, March 12, 2004 12:27 am To: [EMAIL PROTECTED] -------------------------------------------------------------------------- Here is the code I have for the training. Using what you told me below, I can retreive all of the weights that I calculated manually for the hmm (distributions for the transitions and distributions for the alphabet of each state). What I do not understand is how to use this information and the sequences stored in a file to run the BaumWelchAlgorithm and then retreive the optimized values calculated by the algorithm to set them back into my HMM. //Retreive the alphabet of all states FiniteAlphabet SA = hmm.stateAlphabet(); Iterator i = SA.iterator(); SimpleModelTrainer MT = new SimpleModelTrainer(); MT.registerModel(hmm); //go through each state while(i.hasNext()) {Symbol Currentstate = (Symbol)i.next(); //Retreive the distribution of all transitions from the current state FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate); Distribution d = hmm.getWeights((State)Currentstate); Iterator i2 = From.iterator(); //go through it and look at all the weights for each of the transitions while(i2.hasNext()) {Symbol s = (Symbol)i2.next(); System.out.println("From state "+Currentstate.getName()+ "To State "+s.getName()+ "Weight "+d.getWeight(s));} //get the distribution for the alphabet of the current state Distribution d2 =((EmissionState)Currentstate).getDistribution(); FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet(); Iterator i3 = IN.iterator(); //you can go through it the same way as above using a while loop ***************************************************************** This is what I don't understand!!!! ***************************************************************** here, we have a set of training sequences stored in a file in fasta format that i'd like to use with the BaumWelch algorithm to optimize the transition distributions mentionned above. //This is the file with all the training sequences BufferedInputStream is = new BufferedInputStream(new FileInputStream("z:/Sequences.faa")); //Load the file with the SequenceDB class SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet); //use 100 cycles as the stop criteria StoppingCriteria stopper = new StoppingCriteria() {public boolean isTrainingComplete(TrainingAlgorithm ta) {return (ta.getCycle() > 100);}}; ***************************************** This part is what I am clueless about ***************************************** //How do I optimize my hmm with the BaumWelch algorithm and retreive //the optimized values ? How do I train the distribution above with //the baum welch and the sequences that I have ? DP dp= DPFactory.DEFAULT.createDP(hmm); BaumWelchTrainer bwt = new BaumWelchTrainer(dp); } PS : I do not know why you are helping all of us here but thank you. It makes Biojava a lot easier to deal with. Steve > Hi Stephane - > > Within EmissionState you can set a Distribution that contains emission probabilities for the Symbols states emission alphabet using the setDistribution method. This Distribution will be your predetermined weights. > > To set the transition probabilities you can use the setWeights(State source, Distribution weights). The source is the state you are > transitioning from and the weights is the probability of transitioning to any State that the source connects too. Because States implement Symbol you can put them in a Distribution. > > To make a Distribution of States that state 'a' could connect to use the following pseudo code: > > State a; > Model m; > FiniteAlphabet endPoints; > > endPoints = m.transitionsFrom(a); > Distribution d = > DistributionFactory.DEFAULT.createDistribution(endPoints); > > //You can then train d or set it's weights and put it back in the model with > > m.setWeights(a, d); > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 1 Science Park Road > #04-14 The Capricorn, Science Park II > Singapore 117528 > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > [EMAIL PROTECTED] > Sent by: [EMAIL PROTECTED] > 03/12/2004 06:11 AM > > > To: "Biojava Mailing List" <[EMAIL PROTECTED]> > cc: > Subject: [Biojava-l] Parameter Settings in > BaumWelchTraining > > > Hi all. I'm trying to optimize the transition states probabilities for my HMM. I already have set them to values which I think are pretty good. Since I know the Baum Welch can only help with the scores and optimize them up to a local maxima I thought of using the parameters I calculated as a starting point. The problem is that I don't know how! > I followed the example in biojava: > > .... > //train the model to have uniform parameters > ModelTrainer mt = new SimpleModelTrainer(); > //register the model to train > mt.registerModel(hmm); > > I want to use the values already set in my hmm as the starting parameters in the BaumWelch. I don't want to use the uniform distribution as indicated below! > > //as no other counts are being used the null weight will cause > everything to be uniform > mt.setNullModelWeight(1.0); > mt.train(); > > I tried adding counts and looking up examples on the net but ended up more confused than I started. How do I use the addCounts to make this work! > > Stephane Acoca > Master's Student > McGill Center for Bioinformatics > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > > > _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l