> Here's what I didn't expect. I shuffled the order of the pairs in the evaluation set and ran mteval-12.pl again for each set. For each set, > the same data shuffled in a different order and run through mteval-12.pl resulted in different cumulative BLEU scores. These scores > varied from 0.8520 to 0.8627. Same data, different evaluation order.
that is extremely odd. The only thing I can think of is a floating point numerical precision problem. Or a bug in mteval-12.pl. Would it be possible to send me the dataset you're using? I'll take a look at it _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
