Dear Philipp (and others, if that stupid Barracuda spam filter at MIT allows my
question to the list),
I've noticed there's a flag to turn on 'proper' conditioning in phrase extract.
I have not carefully compared the outputs but I guess it would cause counting
all occurrences of foreign (source) phrases f, regardless if they were aligned
to a target phrase in a compatible fashion.
Am I correct that P(e|f) becomes deficient, i.e. not sum to 1 for a given f?
(where P( not-aligned-consistently | f) would be the missing part).
Do properly-conditioned phrase tables indeed work better (in terms of BLEU or
e.g. iterations of MERT loop)?
And one additional question: when extracting phrases, phrase-extract actually
extracts all phrases that *are not incompatible* with the alignment. I'm
thinking about a different method: just phrases that *are 'strictly'
compatible*, which means I would extract:
a=A
c=C
abc=ABC
but not
ab=AB
bc=BC
from:
a b c
A *
B
C *
Any experience with/intuition about that? Surely, there would be far fewer
phrases extracted...
Thanks,
Ondrej.
--
Ondrej Bojar (mailto:[EMAIL PROTECTED] / [EMAIL PROTECTED])
http://www.cuni.cz/~obo
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support