exactly, affects p(e|f) and p(f|e). don't think it affects the lex
probability, philipp wrote the code

On 27 May 2011 23:40, Lane Schwartz <[email protected]> wrote:

> Thanks, Hieu.
>
> Presumably, the discounting that you're talking about would affect the
> value of p(e|f), but would not affect the count values.
>
> Lane
>
>
>
> On Fri, May 27, 2011 at 12:01 PM, Hieu Hoang <[email protected]> wrote:
>
>> the counts are just the denominator when calculating p(e|f) and p(f|e).
>> For example
>>
>>    richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333 0.00361357
>> 2.718 ||| ||| 5 30
>> means that
>>     30 phrase-pairs extracted with the source 'richtigerweise'
>> and
>>    5 phrase-pairs extracted with the target 'quite properly'
>>
>> since p(e|f) = 0.03333, you can also work out that there was 1 (=0.333*30
>> = 0.2*5) phrase pair extracted with the source AND target
>>    richtigerweise ||| quite properly
>>
>> if it was fractional counts (mostly used in hieracical/syntax extraction)
>> then the counts would be the fractional counts. The only time you can't work
>> backwards like that is if you had used GT discounting.
>>
>>
>>
>> On 27/05/2011 22:35, Lane Schwartz wrote:
>>
>> Hieu,
>>
>> Could you elaborate on what the counts mean?
>>
>> How do they relate to the number of source and target phrases extracted
>> during training?
>>
>> Thanks,
>> Lane
>>
>> On Fri, May 27, 2011 at 11:31 AM, Hieu Hoang <[email protected]> wrote:
>>
>>> hi pratyush
>>>
>>> the 4th column contains alignment info if you had switched that option in
>>> during training (can you please point out where in the manual it says the
>>> 3rd & 4th are for alignment - i'll edit the manual. That was the old file
>>> format)
>>>
>>> the 5th column contains the count of the target and the source. It's not
>>> used during decoding, but it's useful for debugging and i thought it might
>>> be useful for other people.
>>>
>>>
>>> On 27/05/2011 22:13, Pratyush Banerjee wrote:
>>>
>>>  Hi,
>>>
>>> I have been trying to figure out the different fields in a standard
>>> phrase table generated by Moses.
>>>
>>> My phrase table trained on Europarl has line like the  following:
>>>
>>>  richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333 0.00361357
>>> 2.718 ||| ||| 5 30
>>> richtigerweise ||| quite right to ||| 0.0238095 0.00123235 0.0333333 
>>> 0.00409211
>>> 2.718 ||| ||| 42 30
>>> richtigerweise ||| quite right ||| 0.00716846 0.00123235 0.0666667 0.0216814
>>> 2.718 ||| ||| 279 30
>>> richtigerweise ||| quite rightly , ||| 0.0416667 0.0062868 0.0333333
>>> 0.00641196 2.718 ||| ||| 24 30
>>> richtigerweise ||| quite rightly ||| 0.0222222 0.0062868 0.166667
>>> 0.0487831 2.718 ||| ||| 225 30
>>>
>>> As per documentation in the Moses website i understand the first 3 fields
>>> are
>>>
>>> Field 1: Source Phrase
>>> Feild 2: Target Phrase
>>> Field 3: scores (inverse phrase translation probability*, *inverse
>>> lexical weighting, direct phrase translation probability *, *direct
>>> lexical weighting*, *phrase penalty )
>>>
>>> However what are the 4th and 5th fields here?  Documentation says that
>>> alignment could be the 3rd and 4th field in the phrase table.
>>> Do they store the alignment information or is it the frequency of the
>>> phrases in source and target corpus respectively?
>>>
>>> It would be great if anybody could explain these fields or point me to a
>>> place where the information is.
>>>
>>> Thanks and Regards,
>>>
>>> Pratyush
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away.  It is time to go elsewhere.  The best thing about space travel
>> is that it made it possible to go elsewhere.
>>                 -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
>                 -- R.A. Heinlein, "Time Enough For Love"
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to