Hi, a difference between the BLUE score reported in the analysis and the NIST BLEU score is that the former uses the tokenization as used in the Moses pipeline, and the NIST tool does its own tokenization from the detokenized output. This leads to different scores, even if they are mostly minor.
About the line numbering - yes, this may be annoying, but it was designed by a computer scientist who famously start counting with 0. -phi On Mon, Sep 14, 2015 at 6:13 AM, Vincent Nguyen <[email protected]> wrote: > Guys, > > While running EMS with a big test file I realized that the analysis.perl > was executed very quickly while the actual Nist-Bleu was much much longer. > > Also one thing is that the file "BLEU-Annotation" generated during > analysis does not contain the right line numbering. > it takes 0 as the first line thus, all line number are offset by 1. > > Last, when you "average" the BLEU score from all these lines, it is not > the actual Nist BLEU score reported, slightly different. > > Is it computed differently ? > > Thanks, > > Vincent > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
