Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu
On 9 July 2015 at 18:21, Harshit Gupta <[email protected]> wrote: > Thanks a lot for the help. > I just have one more doubt in phrase tables. How are the values of > probability in 2nd and 4th column of lexical weighting probabilities > calculated ? Are they also calculated using the same counts or they use a > different function to calculate those probabilities ? > they are calculated slightly differently, using the word alignment > And while giving the output as BEST TRANSLATION, which probability is > refereed by Moses amongst these 4 probabilities calculated in phrase table ? > All 4 probabilities are given weights during tuning. The best translation is the translation with the best weighted score. > > Thanks > > Regards > Harshit > > On Thu, Jul 9, 2015 at 6:08 PM, Hieu Hoang <[email protected]> wrote: > >> Consider the 2nd line '33 7 2'. >> count(target) = 33 >> count(source) = 7 >> count(source, target) = 2 >> >> p(source|target) = count(source, target) / count(target) = 2/33 = 0.0606 >> p(target|source) = count(source, target) / count(source) = 2/7 = 0.2857 >> >> As you can see, the probabilities match the 1st and 3rd numbers in the >> probabilities column. The probabilities column is described here >> http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases >> >> >> On 09/07/2015 15:49, Harshit Gupta wrote: >> >> Hi Hieu, sorry but I didn't get the exact meaning of counts. As an >> example, I am considering few lines from my png file which have same >> English phrase (be) as >> >> be ||| के समय को ||| 1 0.1 0.142857 2.63535e-05 ||| 0-2 ||| 1 7 1 ||| ||| >> be ||| को ||| 0.0606061 0.1 0.285714 0.375 ||| 0-0 ||| 33 7 2 ||| ||| >> be ||| गया है ||| 1 0.0238095 0.142857 0.00162337 ||| 0-1 ||| 1 7 1 ||| >> ||| >> be ||| दिया गया है ||| 1 0.0238095 0.142857 1.40552e-05 ||| 0-2 ||| 1 7 1 >> ||| ||| >> be ||| समय को ||| 1 0.1 0.142857 0.000811687 ||| 0-1 ||| 1 7 1 ||| ||| >> be ||| है ||| 0.0196078 0.0238095 0.142857 0.125 ||| 0-0 ||| 51 7 1 ||| >> ||| >> >> The column after the alignment column shows count. Why are these counts >> different for the same English phrase ? And what does the three discrete >> numbers '1 7 1' or '51 7 1' or '33 7 2' represents ? Does these represents >> the number of times the source/target phrase is repeated in corpora or they >> are calculated using some rule/function in Moses ? >> >> Thanks >> >> Regards >> Harshit >> >> On Thu, Jul 9, 2015 at 4:13 PM, Hieu Hoang <[email protected]> wrote: >> >>> >>> >>> On 09/07/2015 14:19, Harshit Gupta wrote: >>> >>> Hi Hieu, Thanks fot the reply. However, I have some further doubts in >>> this. >>> By count of a phrase, I want to know how many times a phrase is >>> repeated in the corpora. So, can I get this counts from the cpp source file >>> you have mentioned ? >>> Also, in the phrase tables, the first four columns are for lexical >>> weighting and phrase translation probabilities and then there are >>> alignments between the source and target language. Here also, is it >>> possible to get the counts of the phrases ? >>> >>> yes, the next column (after the alignments) are the counts. In your png >>> file, the column '1 3 1' are the counts for the 1st translation rule >>> >>> >>> Regards >>> Harshit >>> >>> On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang < <[email protected]> >>> [email protected]> wrote: >>> >>>> The counts are written in the 5th column in the phrase table. >>>> http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases >>>> This is for debugging purposes only, they don't influence decoding in >>>> anyway. >>>> >>>> IF you want to know more about how it works - the counts are stored in >>>> the file extract.*.sorted.gz and extract.*.inv.sorted.gz. The counts are >>>> summed and the probability is calculated by the score program. The source >>>> code for the score program is in >>>> phrase-extract/score-main.cpp >>>> >>>> >>>> On 08/07/2015 18:05, Harshit Gupta wrote: >>>> >>>> Hi, I am currently working on Moses platform and in the phrase >>>> tables, I am interested in the counts of phrases instead of phrase >>>> translation probabilities. Can I get to know this counts ? >>>> In the Moses manual, it is mentioned that in training process in >>>> calculating phrase scores that >>>> "To estimate the phrase translation probability φ(e|f) we proceed as >>>> follows: First, the extract file is sorted. This ensures that all English >>>> phrase translations for an foreign phrase are next to each other in the >>>> file. Thus, we can process the file, one foreign phrase at a time, *collect >>>> counts* and compute φ(e|f) for that foreign phrase f." >>>> >>>> Where are these counts collected ? Where can I get these counts ? >>>> >>>> Regards >>>> Harshit >>>> >>>> -- >>>> Harshit Gupta >>>> Third Year Undergraduate >>>> Electrical Engineering >>>> IIT Madras >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing >>>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>>> -- >>>> Hieu Hoang >>>> Researcher >>>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu >>>> >>>> >>> >>> >>> -- >>> Harshit Gupta >>> Third Year Undergraduate >>> Electrical Engineering >>> IIT Madras >>> >>> >>> -- >>> Hieu Hoang >>> Researcher >>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu >>> >>> >> >> >> -- >> Harshit Gupta >> Third Year Undergraduate >> Electrical Engineering >> IIT Madras >> >> >> -- >> Hieu Hoang >> Researcher >> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu >> >> > > > -- > Harshit Gupta > Third Year Undergraduate > Electrical Engineering > IIT Madras >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
