Yeah. I kind of figured that this would be the case. The inherent  
problems of using floating point numbers and rounding errors and all  
that kind of thing.

Quoting Miles Osborne <[email protected]>:

> the good thing about probabilities is that they should sum to one
>
> (but you can get numerical errors giving you slightly more / less ...)
> Miles
>
> 2009/7/27 James Read <[email protected]>
>
>> Ok. Thanks. I think I understand this now. I also think I have found
>> the bug in the code which was causing the dodgy output.
>>
>> So, in conclusion, would you say that a good automated check to see if
>> the code is working correctly would be to add up the probabilities at
>> the end of the EM iterations and check that probabilities add up to 1
>> (or slightly less)?
>>
>> James
>>
>> Quoting Philipp Koehn <[email protected]>:
>>
>> > Hi,
>> >
>> > because the final loop in each iteration is:
>> >
>> > // estimate probabilities
>> > for all foreign words f
>> >   for all English words e
>> >     t(e|f) = count}(e|f) / total(f)
>> >
>> > As I said, there are two normalizations: one on the
>> > sentence level, the other on the corpus level.
>> >
>> > -phi
>> >
>> > On Mon, Jul 27, 2009 at 10:30 PM, James Read<[email protected]>
>> wrote:
>> >> In that case I really don't see how the code is guaranteed to give
>> results
>> >> which add up to 1.
>> >>
>> >> Quoting Philipp Koehn <[email protected]>:
>> >>
>> >>> Hi,
>> >>>
>> >>> this is LaTex {algorithmic} code.
>> >>>
>> >>> count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>> >>>
>> >>> means
>> >>>
>> >>> count(e|f) += t(e|f) / s-total(e)
>> >>>
>> >>> So, you got that right.
>> >>>
>> >>> -phi
>> >>>
>> >>> On Mon, Jul 27, 2009 at 10:18 PM, James Read<[email protected]>
>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> this seems to be pretty much what I implemented. What exactly do you
>> mean
>> >>>> by
>> >>>> these three lines?:
>> >>>>
>> >>>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>> >>>> \STATE total($f$)   += $\frac{t(e|f)}{\text{s-total}(e)}$
>> >>>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>> >>>>
>> >>>> What do you mean by $\frac? The pseudocode I was using shows these
>> lines
>> >>>> as
>> >>>> a simple division and this is what my code does. i.e
>> >>>>
>> >>>> t(e|f) = count(e|f) / total(f)
>> >>>>
>> >>>> In C code something like:
>> >>>>
>> >>>> for ( f = 0; f < size_source; f++ )
>> >>>> {
>> >>>>  for ( e = 0; e < size_target; e++ )
>> >>>>  {
>> >>>>   t[f][e] = count[f][e] / total[f];
>> >>>>  }
>> >>>> }
>> >>>>
>> >>>>
>> >>>> Is this the kind of thing you mean?
>> >>>>
>> >>>> Thanks
>> >>>> James
>> >>>>
>> >>>> Quoting Philipp Koehn <[email protected]>:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I think there was a flaw in some versions of the pseudo code.
>> >>>>> The probabilities certainly need to add up to one. There are
>> >>>>> two normalizations going on in the algorithm: one on the sentence
>> >>>>> level (so the probability of all alignments add up to one) and
>> >>>>> one on the word level.
>> >>>>>
>> >>>>> Here the most recent version:
>> >>>>>
>> >>>>> \REQUIRE set of sentence pairs $(\text{\bf e},\text{\bf f})$
>> >>>>> \ENSURE translation prob. $t(e|f)$
>> >>>>> \STATE initialize $t(e|f)$ uniformly
>> >>>>> \WHILE{not converged}
>> >>>>>  \STATE \COMMENT{initialize}
>> >>>>>  \STATE count($e|f$) = 0 {\bf for all} $e,f$
>> >>>>>  \STATE total($f$) = 0 {\bf for all} $f$
>> >>>>>  \FORALL{sentence pairs ({\bf e},{\bf f})}
>> >>>>>   \STATE \COMMENT{compute normalization}
>> >>>>>   \FORALL{words $e$ in {\bf e}}
>> >>>>>     \STATE s-total($e$) = 0
>> >>>>>     \FORALL{words $f$ in {\bf f}}
>> >>>>>       \STATE s-total($e$) += $t(e|f)$
>> >>>>>     \ENDFOR
>> >>>>>   \ENDFOR
>> >>>>>   \STATE \COMMENT{collect counts}
>> >>>>>   \FORALL{words $e$ in {\bf e}}
>> >>>>>     \FORALL{words $f$ in {\bf f}}
>> >>>>>       \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>> >>>>>       \STATE total($f$)   += $\frac{t(e|f)}{\text{s-total}(e)}$
>> >>>>>     \ENDFOR
>> >>>>>   \ENDFOR
>> >>>>>  \ENDFOR
>> >>>>>  \STATE \COMMENT{estimate probabilities}
>> >>>>>  \FORALL{foreign words $f$}
>> >>>>>   \FORALL{English words $e$}
>> >>>>>     \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>> >>>>>   \ENDFOR
>> >>>>>  \ENDFOR
>> >>>>> \ENDWHILE
>> >>>>>
>> >>>>> -phi
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Jul 26, 2009 at 5:24 PM, James Read <[email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I have implemented the EM Model 1 algorithm as outlined in Koehn's
>> >>>>>> lecture notes. I was surprised to find the raw output of the
>> algorithm
>> >>>>>> gives a translation table that given any particular source word the
>> >>>>>> sum of the probabilities of each possible target word is far greater
>> >>>>>> than 1.
>> >>>>>>
>> >>>>>> Is this normal?
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>> James
>> >>>>>>
>> >>>>>> --
>> >>>>>> The University of Edinburgh is a charitable body, registered in
>> >>>>>> Scotland, with registration number SC005336.
>> >>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> Moses-support mailing list
>> >>>>>> [email protected]
>> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> The University of Edinburgh is a charitable body, registered in
>> >>>> Scotland, with registration number SC005336.
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> The University of Edinburgh is a charitable body, registered in
>> >> Scotland, with registration number SC005336.
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to