Consider the 2nd line '33 7 2'.
   count(target) = 33
   count(source) = 7
   count(source, target) = 2

p(source|target) = count(source, target) / count(target) = 2/33 = 0.0606
p(target|source) = count(source, target) / count(source) = 2/7 = 0.2857

As you can see, the probabilities match the 1st and 3rd numbers in the probabilities column. The probabilities column is described here
   http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases

On 09/07/2015 15:49, Harshit Gupta wrote:
Hi Hieu, sorry but I didn't get the exact meaning of counts. As an example, I am considering few lines from my png file which have same English phrase (be) as

be ||| के समय को ||| 1 0.1 0.142857 2.63535e-05 ||| 0-2 ||| 1 7 1 ||| |||
be ||| को ||| 0.0606061 0.1 0.285714 0.375 ||| 0-0 ||| 33 7 2 ||| |||
be ||| गया है ||| 1 0.0238095 0.142857 0.00162337 ||| 0-1 ||| 1 7 1 ||| |||
be ||| दिया गया है ||| 1 0.0238095 0.142857 1.40552e-05 ||| 0-2 ||| 1 7 1 ||| |||
be ||| समय को ||| 1 0.1 0.142857 0.000811687 ||| 0-1 ||| 1 7 1 ||| |||
be ||| है ||| 0.0196078 0.0238095 0.142857 0.125 ||| 0-0 ||| 51 7 1 ||| |||

The column after the alignment column shows count. Why are these counts different for the same English phrase ? And what does the three discrete numbers '1 7 1' or '51 7 1' or '33 7 2' represents ? Does these represents the number of times the source/target phrase is repeated in corpora or they are calculated using some rule/function in Moses ?

Thanks

Regards
Harshit

On Thu, Jul 9, 2015 at 4:13 PM, Hieu Hoang <[email protected] <mailto:[email protected]>> wrote:



    On 09/07/2015 14:19, Harshit Gupta wrote:
    Hi Hieu, Thanks fot the reply. However, I have some further
    doubts in this.
    By count of a phrase, I want to know how many times a phrase is
    repeated in the corpora. So, can I get this counts from the cpp
    source file you have mentioned ?
    Also, in the phrase tables, the first four columns are for
    lexical weighting and phrase translation probabilities and then
    there are alignments between the source and target language. Here
    also, is it possible to get the counts of the phrases ?
    yes, the next column (after the alignments) are the counts. In
    your png file, the column '1 3 1' are the counts for the 1st
    translation rule


    Regards
    Harshit

    On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang <[email protected]
    <mailto:[email protected]>> wrote:

        The counts are written in the 5th column in the phrase table.
        http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
        This is for debugging purposes only, they don't influence
        decoding in anyway.

        IF you want to know more about how it works - the counts are
        stored in the file extract.*.sorted.gz and
        extract.*.inv.sorted.gz. The counts are summed and the
        probability is calculated by the score program. The source
        code for the score program is in
           phrase-extract/score-main.cpp


        On 08/07/2015 18:05, Harshit Gupta wrote:
        Hi, I am currently working on Moses platform and in the
        phrase tables, I am interested in the counts of phrases
        instead of phrase translation probabilities. Can I get to
        know this counts ?
        In the Moses manual, it is mentioned that in training
        process in calculating phrase scores that
        "To estimate the phrase translation probability φ(e|f) we
        proceed as follows: First, the extract file is sorted. This
        ensures that all English phrase translations for an foreign
        phrase are next to each other in the file. Thus, we can
        process the file, one foreign phrase at a time, *collect
        counts* and compute φ(e|f) for that foreign phrase f."

        Where are these counts collected ? Where can I get these
        counts ?

        Regards
        Harshit

-- Harshit Gupta
        Third Year Undergraduate
        Electrical Engineering
        IIT Madras


        _______________________________________________
        Moses-support mailing list
        [email protected] <mailto:[email protected]>
        http://mailman.mit.edu/mailman/listinfo/moses-support

-- Hieu Hoang
        Researcher
        New York University, Abu Dhabi
        http://www.hoang.co.uk/hieu




-- Harshit Gupta
    Third Year Undergraduate
    Electrical Engineering
    IIT Madras

-- Hieu Hoang
    Researcher
    New York University, Abu Dhabi
    http://www.hoang.co.uk/hieu




--
Harshit Gupta
Third Year Undergraduate
Electrical Engineering
IIT Madras

--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to