Hi, I have a 2k sentences tuning, 1k evaluation and a 70k training corpus in the IT software domain and after tuning I get a slightly lower BLEU score but the reordering is way better and therefore the subjective translation quality is better.
In this case I wouldn't recommend to use BLEU as a metric, but METEOR which gives me a more accurate quality measurement: http://www.cs.cmu.edu/~alavie/METEOR/examples.html Regards, Andreas _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
