Yeah. I kind of figured that this would be the case. The inherent problems of using floating point numbers and rounding errors and all that kind of thing.
Quoting Miles Osborne <[email protected]>: > the good thing about probabilities is that they should sum to one > > (but you can get numerical errors giving you slightly more / less ...) > Miles > > 2009/7/27 James Read <[email protected]> > >> Ok. Thanks. I think I understand this now. I also think I have found >> the bug in the code which was causing the dodgy output. >> >> So, in conclusion, would you say that a good automated check to see if >> the code is working correctly would be to add up the probabilities at >> the end of the EM iterations and check that probabilities add up to 1 >> (or slightly less)? >> >> James >> >> Quoting Philipp Koehn <[email protected]>: >> >> > Hi, >> > >> > because the final loop in each iteration is: >> > >> > // estimate probabilities >> > for all foreign words f >> > for all English words e >> > t(e|f) = count}(e|f) / total(f) >> > >> > As I said, there are two normalizations: one on the >> > sentence level, the other on the corpus level. >> > >> > -phi >> > >> > On Mon, Jul 27, 2009 at 10:30 PM, James Read<[email protected]> >> wrote: >> >> In that case I really don't see how the code is guaranteed to give >> results >> >> which add up to 1. >> >> >> >> Quoting Philipp Koehn <[email protected]>: >> >> >> >>> Hi, >> >>> >> >>> this is LaTex {algorithmic} code. >> >>> >> >>> count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$ >> >>> >> >>> means >> >>> >> >>> count(e|f) += t(e|f) / s-total(e) >> >>> >> >>> So, you got that right. >> >>> >> >>> -phi >> >>> >> >>> On Mon, Jul 27, 2009 at 10:18 PM, James Read<[email protected]> >> wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> this seems to be pretty much what I implemented. What exactly do you >> mean >> >>>> by >> >>>> these three lines?: >> >>>> >> >>>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$ >> >>>> \STATE total($f$) += $\frac{t(e|f)}{\text{s-total}(e)}$ >> >>>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$ >> >>>> >> >>>> What do you mean by $\frac? The pseudocode I was using shows these >> lines >> >>>> as >> >>>> a simple division and this is what my code does. i.e >> >>>> >> >>>> t(e|f) = count(e|f) / total(f) >> >>>> >> >>>> In C code something like: >> >>>> >> >>>> for ( f = 0; f < size_source; f++ ) >> >>>> { >> >>>> for ( e = 0; e < size_target; e++ ) >> >>>> { >> >>>> t[f][e] = count[f][e] / total[f]; >> >>>> } >> >>>> } >> >>>> >> >>>> >> >>>> Is this the kind of thing you mean? >> >>>> >> >>>> Thanks >> >>>> James >> >>>> >> >>>> Quoting Philipp Koehn <[email protected]>: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> I think there was a flaw in some versions of the pseudo code. >> >>>>> The probabilities certainly need to add up to one. There are >> >>>>> two normalizations going on in the algorithm: one on the sentence >> >>>>> level (so the probability of all alignments add up to one) and >> >>>>> one on the word level. >> >>>>> >> >>>>> Here the most recent version: >> >>>>> >> >>>>> \REQUIRE set of sentence pairs $(\text{\bf e},\text{\bf f})$ >> >>>>> \ENSURE translation prob. $t(e|f)$ >> >>>>> \STATE initialize $t(e|f)$ uniformly >> >>>>> \WHILE{not converged} >> >>>>> \STATE \COMMENT{initialize} >> >>>>> \STATE count($e|f$) = 0 {\bf for all} $e,f$ >> >>>>> \STATE total($f$) = 0 {\bf for all} $f$ >> >>>>> \FORALL{sentence pairs ({\bf e},{\bf f})} >> >>>>> \STATE \COMMENT{compute normalization} >> >>>>> \FORALL{words $e$ in {\bf e}} >> >>>>> \STATE s-total($e$) = 0 >> >>>>> \FORALL{words $f$ in {\bf f}} >> >>>>> \STATE s-total($e$) += $t(e|f)$ >> >>>>> \ENDFOR >> >>>>> \ENDFOR >> >>>>> \STATE \COMMENT{collect counts} >> >>>>> \FORALL{words $e$ in {\bf e}} >> >>>>> \FORALL{words $f$ in {\bf f}} >> >>>>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$ >> >>>>> \STATE total($f$) += $\frac{t(e|f)}{\text{s-total}(e)}$ >> >>>>> \ENDFOR >> >>>>> \ENDFOR >> >>>>> \ENDFOR >> >>>>> \STATE \COMMENT{estimate probabilities} >> >>>>> \FORALL{foreign words $f$} >> >>>>> \FORALL{English words $e$} >> >>>>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$ >> >>>>> \ENDFOR >> >>>>> \ENDFOR >> >>>>> \ENDWHILE >> >>>>> >> >>>>> -phi >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Sun, Jul 26, 2009 at 5:24 PM, James Read <[email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I have implemented the EM Model 1 algorithm as outlined in Koehn's >> >>>>>> lecture notes. I was surprised to find the raw output of the >> algorithm >> >>>>>> gives a translation table that given any particular source word the >> >>>>>> sum of the probabilities of each possible target word is far greater >> >>>>>> than 1. >> >>>>>> >> >>>>>> Is this normal? >> >>>>>> >> >>>>>> Thanks >> >>>>>> James >> >>>>>> >> >>>>>> -- >> >>>>>> The University of Edinburgh is a charitable body, registered in >> >>>>>> Scotland, with registration number SC005336. >> >>>>>> >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> Moses-support mailing list >> >>>>>> [email protected] >> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>>>> >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> The University of Edinburgh is a charitable body, registered in >> >>>> Scotland, with registration number SC005336. >> >>>> >> >>>> >> >>>> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> The University of Edinburgh is a charitable body, registered in >> >> Scotland, with registration number SC005336. >> >> >> >> >> >> >> > >> > >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > > -- > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
