Dear Moses Support Team,
I added a source context-dependent translation feature in moses baseline
system.
In order to avoid modifying the source code, i append a unique identifier
to every word in the test/dev source file.
for example, a source file with two lines like the following:
this is sentence 1
. sentence 2
would become this~~1 is~~2 sentence~~3 1~~4, .~~5 sentence~~6 2~~7.
Then, i generate my sentence-specific phrase tables for each sentence, use the
same IDs as the source file words in those phrase table entries.
I concatenate all the phrase tables together, then MERT and Decoder as usual.
I do my experiments on Chinese2English translation tasks, and I found that in
the output file the oov words still have IDs .
E.g. the translation of one NIST03 sentence are as follows:
published by the british science weekly , according to the study by the 14th
on chromosome sequencing of genes and gene segments 一千零五十~~97 .
一千零五十~~97 ~~97 is the ID of word " 一千零五十"
I found that when i remove IDs in the output file, the BLEU scores are
significantly difference. I have no idea what happens ? could you give me some
advices?
I use mteval-v13a.pl scripts to calculate BLEU scores in my experiment .
Thanks,
Liang
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support