exactly, affects p(e|f) and p(f|e). don't think it affects the lex probability, philipp wrote the code
On 27 May 2011 23:40, Lane Schwartz <[email protected]> wrote: > Thanks, Hieu. > > Presumably, the discounting that you're talking about would affect the > value of p(e|f), but would not affect the count values. > > Lane > > > > On Fri, May 27, 2011 at 12:01 PM, Hieu Hoang <[email protected]> wrote: > >> the counts are just the denominator when calculating p(e|f) and p(f|e). >> For example >> >> richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333 0.00361357 >> 2.718 ||| ||| 5 30 >> means that >> 30 phrase-pairs extracted with the source 'richtigerweise' >> and >> 5 phrase-pairs extracted with the target 'quite properly' >> >> since p(e|f) = 0.03333, you can also work out that there was 1 (=0.333*30 >> = 0.2*5) phrase pair extracted with the source AND target >> richtigerweise ||| quite properly >> >> if it was fractional counts (mostly used in hieracical/syntax extraction) >> then the counts would be the fractional counts. The only time you can't work >> backwards like that is if you had used GT discounting. >> >> >> >> On 27/05/2011 22:35, Lane Schwartz wrote: >> >> Hieu, >> >> Could you elaborate on what the counts mean? >> >> How do they relate to the number of source and target phrases extracted >> during training? >> >> Thanks, >> Lane >> >> On Fri, May 27, 2011 at 11:31 AM, Hieu Hoang <[email protected]> wrote: >> >>> hi pratyush >>> >>> the 4th column contains alignment info if you had switched that option in >>> during training (can you please point out where in the manual it says the >>> 3rd & 4th are for alignment - i'll edit the manual. That was the old file >>> format) >>> >>> the 5th column contains the count of the target and the source. It's not >>> used during decoding, but it's useful for debugging and i thought it might >>> be useful for other people. >>> >>> >>> On 27/05/2011 22:13, Pratyush Banerjee wrote: >>> >>> Hi, >>> >>> I have been trying to figure out the different fields in a standard >>> phrase table generated by Moses. >>> >>> My phrase table trained on Europarl has line like the following: >>> >>> richtigerweise ||| quite properly ||| 0.2 0.0013016 0.0333333 0.00361357 >>> 2.718 ||| ||| 5 30 >>> richtigerweise ||| quite right to ||| 0.0238095 0.00123235 0.0333333 >>> 0.00409211 >>> 2.718 ||| ||| 42 30 >>> richtigerweise ||| quite right ||| 0.00716846 0.00123235 0.0666667 0.0216814 >>> 2.718 ||| ||| 279 30 >>> richtigerweise ||| quite rightly , ||| 0.0416667 0.0062868 0.0333333 >>> 0.00641196 2.718 ||| ||| 24 30 >>> richtigerweise ||| quite rightly ||| 0.0222222 0.0062868 0.166667 >>> 0.0487831 2.718 ||| ||| 225 30 >>> >>> As per documentation in the Moses website i understand the first 3 fields >>> are >>> >>> Field 1: Source Phrase >>> Feild 2: Target Phrase >>> Field 3: scores (inverse phrase translation probability*, *inverse >>> lexical weighting, direct phrase translation probability *, *direct >>> lexical weighting*, *phrase penalty ) >>> >>> However what are the 4th and 5th fields here? Documentation says that >>> alignment could be the 3rd and 4th field in the phrase table. >>> Do they store the alignment information or is it the frequency of the >>> phrases in source and target corpus respectively? >>> >>> It would be great if anybody could explain these fields or point me to a >>> place where the information is. >>> >>> Thanks and Regards, >>> >>> Pratyush >>> >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> >> -- >> When a place gets crowded enough to require ID's, social collapse is not >> far away. It is time to go elsewhere. The best thing about space travel >> is that it made it possible to go elsewhere. >> -- R.A. Heinlein, "Time Enough For Love" >> >> > > > -- > When a place gets crowded enough to require ID's, social collapse is not > far away. It is time to go elsewhere. The best thing about space travel > is that it made it possible to go elsewhere. > -- R.A. Heinlein, "Time Enough For Love" >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
