In that case I really don't see how the code is guaranteed to give  
results which add up to 1.

Quoting Philipp Koehn <[email protected]>:

> Hi,
>
> this is LaTex {algorithmic} code.
>
> count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>
> means
>
> count(e|f) += t(e|f) / s-total(e)
>
> So, you got that right.
>
> -phi
>
> On Mon, Jul 27, 2009 at 10:18 PM, James Read<[email protected]> wrote:
>> Hi,
>>
>> this seems to be pretty much what I implemented. What exactly do you mean by
>> these three lines?:
>>
>> \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>> \STATE total($f$)   += $\frac{t(e|f)}{\text{s-total}(e)}$
>> \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>>
>> What do you mean by $\frac? The pseudocode I was using shows these lines as
>> a simple division and this is what my code does. i.e
>>
>> t(e|f) = count(e|f) / total(f)
>>
>> In C code something like:
>>
>> for ( f = 0; f < size_source; f++ )
>> {
>>  for ( e = 0; e < size_target; e++ )
>>  {
>>    t[f][e] = count[f][e] / total[f];
>>  }
>> }
>>
>>
>> Is this the kind of thing you mean?
>>
>> Thanks
>> James
>>
>> Quoting Philipp Koehn <[email protected]>:
>>
>>> Hi,
>>>
>>> I think there was a flaw in some versions of the pseudo code.
>>> The probabilities certainly need to add up to one. There are
>>> two normalizations going on in the algorithm: one on the sentence
>>> level (so the probability of all alignments add up to one) and
>>> one on the word level.
>>>
>>> Here the most recent version:
>>>
>>> \REQUIRE set of sentence pairs $(\text{\bf e},\text{\bf f})$
>>> \ENSURE translation prob. $t(e|f)$
>>> \STATE initialize $t(e|f)$ uniformly
>>> \WHILE{not converged}
>>>  \STATE \COMMENT{initialize}
>>>  \STATE count($e|f$) = 0 {\bf for all} $e,f$
>>>  \STATE total($f$) = 0 {\bf for all} $f$
>>>  \FORALL{sentence pairs ({\bf e},{\bf f})}
>>>    \STATE \COMMENT{compute normalization}
>>>    \FORALL{words $e$ in {\bf e}}
>>>      \STATE s-total($e$) = 0
>>>      \FORALL{words $f$ in {\bf f}}
>>>        \STATE s-total($e$) += $t(e|f)$
>>>      \ENDFOR
>>>    \ENDFOR
>>>    \STATE \COMMENT{collect counts}
>>>    \FORALL{words $e$ in {\bf e}}
>>>      \FORALL{words $f$ in {\bf f}}
>>>        \STATE count($e|f$) += $\frac{t(e|f)}{\text{s-total}(e)}$
>>>        \STATE total($f$)   += $\frac{t(e|f)}{\text{s-total}(e)}$
>>>      \ENDFOR
>>>    \ENDFOR
>>>  \ENDFOR
>>>  \STATE \COMMENT{estimate probabilities}
>>>  \FORALL{foreign words $f$}
>>>    \FORALL{English words $e$}
>>>      \STATE $t(e|f)$ = $\frac{\text{count}(e|f)}{\text{total}(f)}$
>>>    \ENDFOR
>>>  \ENDFOR
>>> \ENDWHILE
>>>
>>> -phi
>>>
>>>
>>>
>>> On Sun, Jul 26, 2009 at 5:24 PM, James Read <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have implemented the EM Model 1 algorithm as outlined in Koehn's
>>>> lecture notes. I was surprised to find the raw output of the algorithm
>>>> gives a translation table that given any particular source word the
>>>> sum of the probabilities of each possible target word is far greater
>>>> than 1.
>>>>
>>>> Is this normal?
>>>>
>>>> Thanks
>>>> James
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to