Consider the 2nd line '33 7 2'.
count(target) = 33
count(source) = 7
count(source, target) = 2
p(source|target) = count(source, target) / count(target) = 2/33 = 0.0606
p(target|source) = count(source, target) / count(source) = 2/7 = 0.2857
As you can see, the probabilities match the 1st and 3rd numbers in the
probabilities column. The probabilities column is described here
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
On 09/07/2015 15:49, Harshit Gupta wrote:
Hi Hieu, sorry but I didn't get the exact meaning of counts. As an
example, I am considering few lines from my png file which have same
English phrase (be) as
be ||| के समय को ||| 1 0.1 0.142857 2.63535e-05 ||| 0-2 ||| 1 7 1 ||| |||
be ||| को ||| 0.0606061 0.1 0.285714 0.375 ||| 0-0 ||| 33 7 2 ||| |||
be ||| गया है ||| 1 0.0238095 0.142857 0.00162337 ||| 0-1 ||| 1 7 1 ||| |||
be ||| दिया गया है ||| 1 0.0238095 0.142857 1.40552e-05 ||| 0-2 ||| 1 7
1 ||| |||
be ||| समय को ||| 1 0.1 0.142857 0.000811687 ||| 0-1 ||| 1 7 1 ||| |||
be ||| है ||| 0.0196078 0.0238095 0.142857 0.125 ||| 0-0 ||| 51 7 1 ||| |||
The column after the alignment column shows count. Why are these
counts different for the same English phrase ? And what does the three
discrete numbers '1 7 1' or '51 7 1' or '33 7 2' represents ? Does
these represents the number of times the source/target phrase is
repeated in corpora or they are calculated using some rule/function in
Moses ?
Thanks
Regards
Harshit
On Thu, Jul 9, 2015 at 4:13 PM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
On 09/07/2015 14:19, Harshit Gupta wrote:
Hi Hieu, Thanks fot the reply. However, I have some further
doubts in this.
By count of a phrase, I want to know how many times a phrase is
repeated in the corpora. So, can I get this counts from the cpp
source file you have mentioned ?
Also, in the phrase tables, the first four columns are for
lexical weighting and phrase translation probabilities and then
there are alignments between the source and target language. Here
also, is it possible to get the counts of the phrases ?
yes, the next column (after the alignments) are the counts. In
your png file, the column '1 3 1' are the counts for the 1st
translation rule
Regards
Harshit
On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
The counts are written in the 5th column in the phrase table.
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
This is for debugging purposes only, they don't influence
decoding in anyway.
IF you want to know more about how it works - the counts are
stored in the file extract.*.sorted.gz and
extract.*.inv.sorted.gz. The counts are summed and the
probability is calculated by the score program. The source
code for the score program is in
phrase-extract/score-main.cpp
On 08/07/2015 18:05, Harshit Gupta wrote:
Hi, I am currently working on Moses platform and in the
phrase tables, I am interested in the counts of phrases
instead of phrase translation probabilities. Can I get to
know this counts ?
In the Moses manual, it is mentioned that in training
process in calculating phrase scores that
"To estimate the phrase translation probability φ(e|f) we
proceed as follows: First, the extract file is sorted. This
ensures that all English phrase translations for an foreign
phrase are next to each other in the file. Thus, we can
process the file, one foreign phrase at a time, *collect
counts* and compute φ(e|f) for that foreign phrase f."
Where are these counts collected ? Where can I get these
counts ?
Regards
Harshit
--
Harshit Gupta
Third Year Undergraduate
Electrical Engineering
IIT Madras
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu
--
Harshit Gupta
Third Year Undergraduate
Electrical Engineering
IIT Madras
--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu
--
Harshit Gupta
Third Year Undergraduate
Electrical Engineering
IIT Madras
--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support