Hi, the total number of extracted phrases in a sentence pair depends on:
- the particular word alignment you are considering
- the heuristic you adopt for the words left unaligned or aligned with the null 
word

Greetings,

Marcello

-------
Short from my mobile phone

On 30/gen/2013, at 05:46 PM, "Cuong Hoang" 
<hoangcuong2...@gmail.com<mailto:hoangcuong2...@gmail.com>> wrote:

Hi all,
I write a phrase extraction with the rule that is simple from Koehn et. al, 
2003:

``We collect all aligned phrase pairs that are consistent with the word 
alignment: The words in a legal phrase pair are only aligned to each other, and 
not to words outside."

I test on a quite large bilingual corpus contained 500,000 pairs of sentences, 
and obtain 33 million phrase pairs.
However, when I use Moses to extract phrases, I obtain around 90 million pairs.

Does MOSES use some other rules, or there is something wrong, isn't it?

Thanks,
C. Hoang
--
Best Regards,
C. Hoang

{Mimosa, SMT}@Addict
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to