Hi Liang, mteval-v13a.pl does some internal tokenization and probably splits those "<UNKNOWN>~~<ID>" words into "<UNKNOWN> ~ ~ <ID>". If this is happening, it explains your difference in the calculated BLEU scores.
Cheers, Matthias On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote: > Dear Moses Support Team, > > I added a source context-dependent translation feature in moses baseline > system. > In order to avoid modifying the source code, i append a unique identifier > to every word in the test/dev source file. > for example, a source file with two lines like the following: > this is sentence 1 > . sentence 2 > would become this~~1 is~~2 sentence~~3 1~~4, .~~5 sentence~~6 2~~7. > Then, i generate my sentence-specific phrase tables for each sentence, use > the same IDs as the source file words in those phrase table entries. > I concatenate all the phrase tables together, then MERT and Decoder as usual. > > I do my experiments on Chinese2English translation tasks, and I found that in > the output file the oov words still have IDs . > E.g. the translation of one NIST03 sentence are as follows: > published by the british science weekly , according to the study by the 14th > on chromosome sequencing of genes and gene segments 一千零五十~~97 . > 一千零五十~~97 ~~97 is the ID of word " 一千零五十" > I found that when i remove IDs in the output file, the BLEU scores are > significantly difference. I have no idea what happens ? could you give me > some advices? > I use mteval-v13a.pl scripts to calculate BLEU scores in my experiment . > > > > > Thanks, > Liang > > > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support