On 09/07/2015 14:19, Harshit Gupta wrote:
Hi Hieu, Thanks fot the reply. However, I have some further doubts in this. By count of a phrase, I want to know how many times a phrase is repeated in the corpora. So, can I get this counts from the cpp source file you have mentioned ? Also, in the phrase tables, the first four columns are for lexical weighting and phrase translation probabilities and then there are alignments between the source and target language. Here also, is it possible to get the counts of the phrases ?
yes, the next column (after the alignments) are the counts. In your png file, the column '1 3 1' are the counts for the 1st translation rule

Regards
Harshit

On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang <[email protected] <mailto:[email protected]>> wrote:

    The counts are written in the 5th column in the phrase table.
    http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
    This is for debugging purposes only, they don't influence decoding
    in anyway.

    IF you want to know more about how it works - the counts are
    stored in the file extract.*.sorted.gz and
    extract.*.inv.sorted.gz. The counts are summed and the probability
    is calculated by the score program. The source code for the score
    program is in
       phrase-extract/score-main.cpp


    On 08/07/2015 18:05, Harshit Gupta wrote:
    Hi, I am currently working on Moses platform and in the phrase
    tables, I am interested in the counts of phrases instead of
    phrase translation probabilities. Can I get to know this counts ?
    In the Moses manual, it is mentioned that in training process in
    calculating phrase scores that
    "To estimate the phrase translation probability φ(e|f) we proceed
    as follows: First, the extract file is sorted. This ensures that
    all English phrase translations for an foreign phrase are next to
    each other in the file. Thus, we can process the file, one
    foreign phrase at a time, *collect counts* and compute φ(e|f) for
    that foreign phrase f."

    Where are these counts collected ? Where can I get these counts ?

    Regards
    Harshit

-- Harshit Gupta
    Third Year Undergraduate
    Electrical Engineering
    IIT Madras


    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support

-- Hieu Hoang
    Researcher
    New York University, Abu Dhabi
    http://www.hoang.co.uk/hieu




--
Harshit Gupta
Third Year Undergraduate
Electrical Engineering
IIT Madras

--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to