Hi, the total number of extracted phrases in a sentence pair depends on: - the particular word alignment you are considering - the heuristic you adopt for the words left unaligned or aligned with the null word
Greetings, Marcello ------- Short from my mobile phone On 30/gen/2013, at 05:46 PM, "Cuong Hoang" <hoangcuong2...@gmail.com<mailto:hoangcuong2...@gmail.com>> wrote: Hi all, I write a phrase extraction with the rule that is simple from Koehn et. al, 2003: ``We collect all aligned phrase pairs that are consistent with the word alignment: The words in a legal phrase pair are only aligned to each other, and not to words outside." I test on a quite large bilingual corpus contained 500,000 pairs of sentences, and obtain 33 million phrase pairs. However, when I use Moses to extract phrases, I obtain around 90 million pairs. Does MOSES use some other rules, or there is something wrong, isn't it? Thanks, C. Hoang -- Best Regards, C. Hoang {Mimosa, SMT}@Addict _______________________________________________ Moses-support mailing list Moses-support@mit.edu<mailto:Moses-support@mit.edu> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support