Re: [Moses-support] EM Model 1 question

Philipp Koehn Mon, 27 Jul 2009 14:42:15 -0700

Hi,

because the final loop in each iteration is:


// estimate probabilities
for all foreign words f
  for all English words e
    t(e|f) = count}(e|f) / total(f)

As I said, there are two normalizations: one on the
sentence level, the other on the corpus level.

-phi

On Mon, Jul 27, 2009 at 10:30 PM, James Read<[email protected]> wrote:
> In that case I really don't see how the code is guaranteed to give results
> which add up to 1.
>
> Quoting Philipp Koehn <[email protected]>:
>
>> Hi,
>>
>> this is LaTex {algorithmic} code.
>>
>> count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>
>> means
>>
>> count(e|f) += t(e|f) / s-total(e)
>>
>> So, you got that right.
>>
>> -phi
>>
>> On Mon, Jul 27, 2009 at 10:18 PM, James Read<[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> this seems to be pretty much what I implemented. What exactly do you mean
>>> by
>>> these three lines?:
>>>
>>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>> \STATE total($f$)   += $\frac{t(e|f)}{\text{s-total}(e)}$
>>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>>>
>>> What do you mean by $\frac? The pseudocode I was using shows these lines
>>> as
>>> a simple division and this is what my code does. i.e
>>>
>>> t(e|f) = count(e|f) / total(f)
>>>
>>> In C code something like:
>>>
>>> for ( f = 0; f < size_source; f++ )
>>> {
>>>  for ( e = 0; e < size_target; e++ )
>>>  {
>>>   t[f][e] = count[f][e] / total[f];
>>>  }
>>> }
>>>
>>>
>>> Is this the kind of thing you mean?
>>>
>>> Thanks
>>> James
>>>
>>> Quoting Philipp Koehn <[email protected]>:
>>>
>>>> Hi,
>>>>
>>>> I think there was a flaw in some versions of the pseudo code.
>>>> The probabilities certainly need to add up to one. There are
>>>> two normalizations going on in the algorithm: one on the sentence
>>>> level (so the probability of all alignments add up to one) and
>>>> one on the word level.
>>>>
>>>> Here the most recent version:
>>>>
>>>> \REQUIRE set of sentence pairs $(\text{\bf e},\text{\bf f})$
>>>> \ENSURE translation prob. $t(e|f)$
>>>> \STATE initialize $t(e|f)$ uniformly
>>>> \WHILE{not converged}
>>>>  \STATE \COMMENT{initialize}
>>>>  \STATE count($e|f$) = 0 {\bf for all} $e,f$
>>>>  \STATE total($f$) = 0 {\bf for all} $f$
>>>>  \FORALL{sentence pairs ({\bf e},{\bf f})}
>>>>   \STATE \COMMENT{compute normalization}
>>>>   \FORALL{words $e$ in {\bf e}}
>>>>     \STATE s-total($e$) = 0
>>>>     \FORALL{words $f$ in {\bf f}}
>>>>       \STATE s-total($e$) += $t(e|f)$
>>>>     \ENDFOR
>>>>   \ENDFOR
>>>>   \STATE \COMMENT{collect counts}
>>>>   \FORALL{words $e$ in {\bf e}}
>>>>     \FORALL{words $f$ in {\bf f}}
>>>>       \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>>>       \STATE total($f$)   += $\frac{t(e|f)}{\text{s-total}(e)}$
>>>>     \ENDFOR
>>>>   \ENDFOR
>>>>  \ENDFOR
>>>>  \STATE \COMMENT{estimate probabilities}
>>>>  \FORALL{foreign words $f$}
>>>>   \FORALL{English words $e$}
>>>>     \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>>>>   \ENDFOR
>>>>  \ENDFOR
>>>> \ENDWHILE
>>>>
>>>> -phi
>>>>
>>>>
>>>>
>>>> On Sun, Jul 26, 2009 at 5:24 PM, James Read <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have implemented the EM Model 1 algorithm as outlined in Koehn's
>>>>> lecture notes. I was surprised to find the raw output of the algorithm
>>>>> gives a translation table that given any particular source word the
>>>>> sum of the probabilities of each possible target word is far greater
>>>>> than 1.
>>>>>
>>>>> Is this normal?
>>>>>
>>>>> Thanks
>>>>> James
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EM Model 1 question

Reply via email to