Hi,
because the final loop in each iteration is:
// estimate probabilities
for all foreign words f
for all English words e
t(e|f) = count}(e|f) / total(f)
As I said, there are two normalizations: one on the
sentence level, the other on the corpus level.
-phi
On Mon, Jul 27, 2009 at 10:30 PM, James Read<[email protected]> wrote:
> In that case I really don't see how the code is guaranteed to give results
> which add up to 1.
>
> Quoting Philipp Koehn <[email protected]>:
>
>> Hi,
>>
>> this is LaTex {algorithmic} code.
>>
>> count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>
>> means
>>
>> count(e|f) += t(e|f) / s-total(e)
>>
>> So, you got that right.
>>
>> -phi
>>
>> On Mon, Jul 27, 2009 at 10:18 PM, James Read<[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> this seems to be pretty much what I implemented. What exactly do you mean
>>> by
>>> these three lines?:
>>>
>>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>> \STATE total($f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>>>
>>> What do you mean by $\frac? The pseudocode I was using shows these lines
>>> as
>>> a simple division and this is what my code does. i.e
>>>
>>> t(e|f) = count(e|f) / total(f)
>>>
>>> In C code something like:
>>>
>>> for ( f = 0; f < size_source; f++ )
>>> {
>>> for ( e = 0; e < size_target; e++ )
>>> {
>>> t[f][e] = count[f][e] / total[f];
>>> }
>>> }
>>>
>>>
>>> Is this the kind of thing you mean?
>>>
>>> Thanks
>>> James
>>>
>>> Quoting Philipp Koehn <[email protected]>:
>>>
>>>> Hi,
>>>>
>>>> I think there was a flaw in some versions of the pseudo code.
>>>> The probabilities certainly need to add up to one. There are
>>>> two normalizations going on in the algorithm: one on the sentence
>>>> level (so the probability of all alignments add up to one) and
>>>> one on the word level.
>>>>
>>>> Here the most recent version:
>>>>
>>>> \REQUIRE set of sentence pairs $(\text{\bf e},\text{\bf f})$
>>>> \ENSURE translation prob. $t(e|f)$
>>>> \STATE initialize $t(e|f)$ uniformly
>>>> \WHILE{not converged}
>>>> \STATE \COMMENT{initialize}
>>>> \STATE count($e|f$) = 0 {\bf for all} $e,f$
>>>> \STATE total($f$) = 0 {\bf for all} $f$
>>>> \FORALL{sentence pairs ({\bf e},{\bf f})}
>>>> \STATE \COMMENT{compute normalization}
>>>> \FORALL{words $e$ in {\bf e}}
>>>> \STATE s-total($e$) = 0
>>>> \FORALL{words $f$ in {\bf f}}
>>>> \STATE s-total($e$) += $t(e|f)$
>>>> \ENDFOR
>>>> \ENDFOR
>>>> \STATE \COMMENT{collect counts}
>>>> \FORALL{words $e$ in {\bf e}}
>>>> \FORALL{words $f$ in {\bf f}}
>>>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>>> \STATE total($f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>>> \ENDFOR
>>>> \ENDFOR
>>>> \ENDFOR
>>>> \STATE \COMMENT{estimate probabilities}
>>>> \FORALL{foreign words $f$}
>>>> \FORALL{English words $e$}
>>>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>>>> \ENDFOR
>>>> \ENDFOR
>>>> \ENDWHILE
>>>>
>>>> -phi
>>>>
>>>>
>>>>
>>>> On Sun, Jul 26, 2009 at 5:24 PM, James Read <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have implemented the EM Model 1 algorithm as outlined in Koehn's
>>>>> lecture notes. I was surprised to find the raw output of the algorithm
>>>>> gives a translation table that given any particular source word the
>>>>> sum of the probabilities of each possible target word is far greater
>>>>> than 1.
>>>>>
>>>>> Is this normal?
>>>>>
>>>>> Thanks
>>>>> James
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support