Hi,
I think there was a flaw in some versions of the pseudo code.
The probabilities certainly need to add up to one. There are
two normalizations going on in the algorithm: one on the sentence
level (so the probability of all alignments add up to one) and
one on the word level.
Here the most recent version:
\REQUIRE set of sentence pairs $(\text{\bf e},\text{\bf f})$
\ENSURE translation prob. $t(e|f)$
\STATE initialize $t(e|f)$ uniformly
\WHILE{not converged}
\STATE \COMMENT{initialize}
\STATE count($e|f$) = 0 {\bf for all} $e,f$
\STATE total($f$) = 0 {\bf for all} $f$
\FORALL{sentence pairs ({\bf e},{\bf f})}
\STATE \COMMENT{compute normalization}
\FORALL{words $e$ in {\bf e}}
\STATE s-total($e$) = 0
\FORALL{words $f$ in {\bf f}}
\STATE s-total($e$) += $t(e|f)$
\ENDFOR
\ENDFOR
\STATE \COMMENT{collect counts}
\FORALL{words $e$ in {\bf e}}
\FORALL{words $f$ in {\bf f}}
\STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
\STATE total($f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
\ENDFOR
\ENDFOR
\ENDFOR
\STATE \COMMENT{estimate probabilities}
\FORALL{foreign words $f$}
\FORALL{English words $e$}
\STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
\ENDFOR
\ENDFOR
\ENDWHILE
-phi
On Sun, Jul 26, 2009 at 5:24 PM, James Read <[email protected]> wrote:
> Hi,
>
> I have implemented the EM Model 1 algorithm as outlined in Koehn's
> lecture notes. I was surprised to find the raw output of the algorithm
> gives a translation table that given any particular source word the
> sum of the probabilities of each possible target word is far greater
> than 1.
>
> Is this normal?
>
> Thanks
> James
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support