Hi Liang,

mteval-v13a.pl does some internal tokenization and probably splits those
"<UNKNOWN>~~<ID>" words into "<UNKNOWN> ~ ~ <ID>". If this is happening,
it explains your difference in the calculated BLEU scores.

Cheers,
Matthias


On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote:
> Dear Moses Support Team,
>   
>    I added a source context-dependent  translation feature in moses baseline 
> system.
>    In order to avoid  modifying the source code, i append a unique identifier 
> to every word in the test/dev source file.
>    for example, a source file with two lines like the following: 
>       this is sentence 1
>      .  sentence 2
> would become this~~1 is~~2 sentence~~3 1~~4, .~~5 sentence~~6 2~~7.
> Then, i generate my sentence-specific phrase tables for each sentence, use 
> the same IDs as the source file words in those phrase table entries. 
> I concatenate all the phrase tables together, then MERT and Decoder as usual. 
>  
> I do my experiments on Chinese2English translation tasks, and I found that in 
> the output file the oov words still have IDs .
> E.g. the translation of one NIST03 sentence are as follows:
>  published by the british science weekly , according to the study by the 14th 
> on chromosome sequencing of genes and gene segments 一千零五十~~97 .
>      一千零五十~~97 ~~97 is the ID of word " 一千零五十"
> I found that when i remove IDs in the output file, the BLEU scores are 
> significantly difference. I have no idea what happens ? could you give me 
> some advices?
> I use mteval-v13a.pl scripts to calculate BLEU scores in my experiment .
> 
> 
> 
> 
> Thanks,
> Liang
> 
> 
>        
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to