I agree. If the BaumWelch trainer does cause problems one could always implement a different version of ModelTrainer.
- Mark Dan Bolser <[EMAIL PROTECTED]> 03/12/2004 04:47 PM To: Mark Schreiber/GP/[EMAIL PROTECTED] cc: [EMAIL PROTECTED], Biojava Mailing List <[EMAIL PROTECTED]> Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] On Fri, 12 Mar 2004 [EMAIL PROTECTED] wrote: > When you call the train() method of the BaumWelchTrainer you supply it > with a SequenceDB. The sequences from this DB are used to optimize the > weights of the model. > > However, I have a bad feeling that when you train your model with the > BaumWelchTrainer your previously set counts will be ignored and > overwritten. You could check by looking into AbstractModelTrainer.train() > (which is what the BaumWelchTrainer extends). You could also run some > tests to see if using a pre-trained model makes any difference to the > final outcome. Does anyone more expert than me on the DP package (ie most > people) know if the counts are overwritten? The idea sounds good either way, so it would be a shame to have to reject it on the basis of a technicality :) Cheers > > - Mark > > > > > > [EMAIL PROTECTED] > Sent by: [EMAIL PROTECTED] > 03/12/2004 01:30 PM > > > To: [EMAIL PROTECTED] > cc: Biojava Mailing List <[EMAIL PROTECTED]> > Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] > > > Sorry for the previous error. > ---------------------------- Original Message ---------------------------- > Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining > From: [EMAIL PROTECTED] > Date: Fri, March 12, 2004 12:27 am > To: [EMAIL PROTECTED] > -------------------------------------------------------------------------- > > Here is the code I have for the training. Using what you told me below, I > can retreive all of the weights that I calculated manually for the hmm > (distributions for the transitions and distributions for the alphabet of > each state). What I do not understand is how to use this information and > the sequences stored in a file to run the BaumWelchAlgorithm and then > retreive the optimized values calculated by the algorithm to set them back > into my HMM. > > //Retreive the alphabet of all states > FiniteAlphabet SA = hmm.stateAlphabet(); > Iterator i = SA.iterator(); > > SimpleModelTrainer MT = new SimpleModelTrainer(); > MT.registerModel(hmm); > > //go through each state > while(i.hasNext()) > {Symbol Currentstate = (Symbol)i.next(); > > //Retreive the distribution of all transitions from the current state > FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate); > Distribution d = hmm.getWeights((State)Currentstate); > Iterator i2 = From.iterator(); > > //go through it and look at all the weights for each of the transitions > while(i2.hasNext()) > {Symbol s = (Symbol)i2.next(); > System.out.println("From state "+Currentstate.getName()+ > "To State "+s.getName()+ > "Weight "+d.getWeight(s));} > > //get the distribution for the alphabet of the current state > Distribution d2 =((EmissionState)Currentstate).getDistribution(); > FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet(); > Iterator i3 = IN.iterator(); > //you can go through it the same way as above using a while loop > ***************************************************************** > This is what I don't understand!!!! > ***************************************************************** > here, we have a set of training sequences stored in a file in fasta format > that i'd like to use with the BaumWelch algorithm to optimize the > transition distributions mentionned above. > > //This is the file with all the training sequences > BufferedInputStream is = new BufferedInputStream(new > FileInputStream("z:/Sequences.faa")); > > //Load the file with the SequenceDB class > SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet); > > //use 100 cycles as the stop criteria > StoppingCriteria stopper = new StoppingCriteria() > {public boolean isTrainingComplete(TrainingAlgorithm ta) > {return (ta.getCycle() > 100);}}; > > ***************************************** > This part is what I am clueless about > ***************************************** > //How do I optimize my hmm with the BaumWelch algorithm and retreive //the > optimized values ? How do I train the distribution above with //the baum > welch and the sequences that I have ? > DP dp= DPFactory.DEFAULT.createDP(hmm); > BaumWelchTrainer bwt = new BaumWelchTrainer(dp); > } > > PS : I do not know why you are helping all of us here but thank you. It > makes Biojava a lot easier to deal with. > > Steve > > > Hi Stephane - > > > > Within EmissionState you can set a Distribution that contains emission > probabilities for the Symbols states emission alphabet using the > setDistribution method. This Distribution will be your predetermined > weights. > > > > To set the transition probabilities you can use the setWeights(State > source, Distribution weights). The source is the state you are > > transitioning from and the weights is the probability of transitioning > to any State that the source connects too. Because States implement > Symbol you can put them in a Distribution. > > > > To make a Distribution of States that state 'a' could connect to use the > following pseudo code: > > > > State a; > > Model m; > > FiniteAlphabet endPoints; > > > > endPoints = m.transitionsFrom(a); > > Distribution d = > > DistributionFactory.DEFAULT.createDistribution(endPoints); > > > > //You can then train d or set it's weights and put it back in the model > with > > > > m.setWeights(a, d); > > > > Mark Schreiber > > Principal Scientist (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 1 Science Park Road > > #04-14 The Capricorn, Science Park II > > Singapore 117528 > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > > > > > > > > > [EMAIL PROTECTED] > > Sent by: [EMAIL PROTECTED] > > 03/12/2004 06:11 AM > > > > > > To: "Biojava Mailing List" <[EMAIL PROTECTED]> > > cc: > > Subject: [Biojava-l] Parameter Settings in > > BaumWelchTraining > > > > > > Hi all. I'm trying to optimize the transition states probabilities for > my HMM. I already have set them to values which I think are pretty good. > Since I know the Baum Welch can only help with the scores and optimize > them up to a local maxima I thought of using the parameters I calculated > as a starting point. The problem is that I don't know how! > > I followed the example in biojava: > > > > .... > > //train the model to have uniform parameters > > ModelTrainer mt = new SimpleModelTrainer(); > > //register the model to train > > mt.registerModel(hmm); > > > > I want to use the values already set in my hmm as the starting > parameters in the BaumWelch. I don't want to use the uniform > distribution as indicated below! > > > > //as no other counts are being used the null weight will cause > > everything to be uniform > > mt.setNullModelWeight(1.0); > > mt.train(); > > > > I tried adding counts and looking up examples on the net but ended up > more confused than I started. How do I use the addCounts to make this > work! > > > > Stephane Acoca > > Master's Student > > McGill Center for Bioinformatics > > > > _______________________________________________ > > Biojava-l mailing list - [EMAIL PROTECTED] > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l