Hi Ondrej, See below. > And one additional question: when extracting phrases, phrase-extract actually > extracts all phrases that *are not incompatible* with the alignment. I'm > thinking about a different method: just phrases that *are 'strictly' > compatible*, which means I would extract: > > a=A > c=C > abc=ABC > > but not > > ab=AB > bc=BC > > from: > > a b c > A * > B > C * > > Any experience with/intuition about that? Surely, there would be far fewer > phrases extracted...
The difference in extraction you're talking about is generally referred to as "tight" vs. "loose" extraction, Fazil Ayan looked at the effects of the extraction heuristics used in this paper (see sec. 4.2): ftp://ftp.umiacs.umd.edu/pub/bonnie/acl06_final.pdf -Chris _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
