the counts are just the denominator when calculating p(e|f) and p(f|e).
For example
richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333
0.00361357 2.718 <tel:0.00361357%202.718> ||| ||| 5 30
means that
30 phrase-pairs extracted with the source 'richtigerweise'
and
5 phrase-pairs extracted with the target 'quite properly'
since p(e|f) = 0.03333, you can also work out that there was 1
(=0.333*30 = 0.2*5) phrase pair extracted with the source AND target
richtigerweise ||| quite properly
if it was fractional counts (mostly used in hieracical/syntax
extraction) then the counts would be the fractional counts. The only
time you can't work backwards like that is if you had used GT discounting.
On 27/05/2011 22:35, Lane Schwartz wrote:
Hieu,
Could you elaborate on what the counts mean?
How do they relate to the number of source and target phrases
extracted during training?
Thanks,
Lane
On Fri, May 27, 2011 at 11:31 AM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
hi pratyush
the 4th column contains alignment info if you had switched that
option in during training (can you please point out where in the
manual it says the 3rd & 4th are for alignment - i'll edit the
manual. That was the old file format)
the 5th column contains the count of the target and the source.
It's not used during decoding, but it's useful for debugging and i
thought it might be useful for other people.
On 27/05/2011 22:13, Pratyush Banerjee wrote:
Hi,
I have been trying to figure out the different fields in a
standard phrase table generated by Moses.
My phrase table trained on Europarl has line like the following:
richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333
0.00361357 2.718 <tel:0.00361357%202.718> ||| ||| 5 30
richtigerweise ||| quite right to ||| 0.0238095 0.00123235
0.0333333 0.00409211 2.718 <tel:0.00409211%202.718> ||| ||| 42 30
richtigerweise ||| quite right ||| 0.00716846 0.00123235
0.0666667 0.0216814 2.718 <tel:0.0216814%202.718> ||| ||| 279 30
richtigerweise ||| quite rightly , ||| 0.0416667 0.0062868
0.0333333 0.00641196 2.718 ||| ||| 24 30
richtigerweise ||| quite rightly ||| 0.0222222 0.0062868 0.166667
0.0487831 2.718 ||| ||| 225 30
As per documentation in the Moses website i understand the first
3 fields are
Field 1: Source Phrase
Feild 2: Target Phrase
Field 3: scores (inverse phrase translation probability/,
/inverse lexical weighting, direct phrase translation probability
/, /direct lexical weighting/, /phrase penalty )
However what are the 4th and 5th fields here? Documentation says
that alignment could be the 3rd and 4th field in the phrase table.
Do they store the alignment information or is it the frequency of
the phrases in source and target corpus respectively?
It would be great if anybody could explain these fields or point
me to a place where the information is.
Thanks and Regards,
Pratyush
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support