Hi,

a difference between the BLUE score reported in the analysis
and the NIST BLEU score is that the former uses the tokenization
as used in the Moses pipeline, and the NIST tool does its own
tokenization from the detokenized output. This leads to different
scores, even if they are mostly minor.

About the line numbering - yes, this may be annoying, but it was
designed by a computer scientist who famously start counting
with 0.

-phi

On Mon, Sep 14, 2015 at 6:13 AM, Vincent Nguyen <[email protected]> wrote:

> Guys,
>
> While running EMS with a big test file I realized that the analysis.perl
> was executed very quickly while the actual Nist-Bleu was much much longer.
>
> Also one thing is that the file "BLEU-Annotation" generated during
> analysis does not contain the right line numbering.
> it takes 0 as the first line thus, all line number are offset by 1.
>
> Last, when you "average" the BLEU score from all these lines, it is not
> the actual Nist BLEU score reported, slightly different.
>
> Is it computed differently ?
>
> Thanks,
>
> Vincent
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to