Another possibility is that the "noise" in the development set is simply that it has longer or shorter translations than the test set.
The attached plot shows several variants of BLEU against many different systems obtained simply by varying the weight of the length feature, holding all others constant. The main thing to observe is that BLEU is sharply peaked around a hypothesis length that matches the effective test set length (as enforced by the implementation of the brevity penalty, which is the difference between the BLEU variants plotted here). I suspect that a primary function of MERT in a system like Moses is setting this length correctly, since most of the features are overlapping and/or useless. If while tuning MERT finds a peak in the development set error surface that is offset from the test set error surface (as a function of the parameters) then the effects would be unpredictable. It is always advisable to check the BLEU length penalty calculation to make sure that something like this isn't going on. In either case, you should of course follow the advice of Jonathan Clark, just to ensure that what you're looking at isn't an outlier. http://aclweb.org/anthology-new/P/P11/P11-1042.pdf Cheers Adam On Thu, Jul 7, 2011 at 3:47 AM, Andreas Kull <[email protected]> wrote: > Hi, > > I have a 2k sentences tuning, 1k evaluation and a 70k training corpus > in the IT software domain and after tuning I get a slightly lower BLEU > score but the reordering is way better and therefore the subjective > translation quality is better. > > In this case I wouldn't recommend to use BLEU as a metric, but METEOR > which gives me a more accurate quality measurement: > > http://www.cs.cmu.edu/~alavie/METEOR/examples.html > > > Regards, > Andreas > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
<<attachment: Bleu-vs-hypothesis-length.jpg>>
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
