the counts are just the denominator when calculating p(e|f) and p(f|e). For example richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333 0.00361357 2.718 <tel:0.00361357%202.718> ||| ||| 5 30
means that
    30 phrase-pairs extracted with the source 'richtigerweise'
and
   5 phrase-pairs extracted with the target 'quite properly'

since p(e|f) = 0.03333, you can also work out that there was 1 (=0.333*30 = 0.2*5) phrase pair extracted with the source AND target
   richtigerweise ||| quite properly

if it was fractional counts (mostly used in hieracical/syntax extraction) then the counts would be the fractional counts. The only time you can't work backwards like that is if you had used GT discounting.


On 27/05/2011 22:35, Lane Schwartz wrote:
Hieu,
Could you elaborate on what the counts mean?
How do they relate to the number of source and target phrases extracted during training?
Thanks,
Lane

On Fri, May 27, 2011 at 11:31 AM, Hieu Hoang <[email protected] <mailto:[email protected]>> wrote:

    hi pratyush

    the 4th column contains alignment info if you had switched that
    option in during training (can you please point out where in the
    manual it says the 3rd & 4th are for alignment - i'll edit the
    manual. That was the old file format)

    the 5th column contains the count of the target and the source.
    It's not used during decoding, but it's useful for debugging and i
    thought it might be useful for other people.


    On 27/05/2011 22:13, Pratyush Banerjee wrote:
    Hi,

    I have been trying to figure out the different fields in a
    standard phrase table generated by Moses.

    My phrase table trained on Europarl has line like the  following:

     richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333
    0.00361357 2.718 <tel:0.00361357%202.718> ||| ||| 5 30
    richtigerweise ||| quite right to ||| 0.0238095 0.00123235
    0.0333333 0.00409211 2.718 <tel:0.00409211%202.718> ||| ||| 42 30
    richtigerweise ||| quite right ||| 0.00716846 0.00123235
    0.0666667 0.0216814 2.718 <tel:0.0216814%202.718> ||| ||| 279 30
    richtigerweise ||| quite rightly , ||| 0.0416667 0.0062868
    0.0333333 0.00641196 2.718 ||| ||| 24 30
    richtigerweise ||| quite rightly ||| 0.0222222 0.0062868 0.166667
    0.0487831 2.718 ||| ||| 225 30

    As per documentation in the Moses website i understand the first
    3 fields are

    Field 1: Source Phrase
    Feild 2: Target Phrase
    Field 3: scores (inverse phrase translation probability/,
    /inverse lexical weighting, direct phrase translation probability
    /, /direct lexical weighting/, /phrase penalty )

    However what are the 4th and 5th fields here?  Documentation says
    that alignment could be the 3rd and 4th field in the phrase table.
    Do they store the alignment information or is it the frequency of
    the phrases in source and target corpus respectively?

    It would be great if anybody could explain these fields or point
    me to a place where the information is.

    Thanks and Regards,

    Pratyush




    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support




--
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to