Re: [Moses-support] does mert usually enhance BLEU on a test set?

Adam Lopez Thu, 07 Jul 2011 05:22:52 -0700

Another possibility is that the "noise" in the development set is
simply that it has longer or shorter translations than the test set.

The attached plot shows several variants of BLEU against many
different systems obtained simply by varying the weight of the length
feature, holding all others constant.  The main thing to observe is
that BLEU is sharply peaked around a hypothesis length that matches
the effective test set length (as enforced by the implementation of
the brevity penalty, which is the difference between the BLEU variants
plotted here).  I suspect that a primary function of MERT in a system
like Moses is setting this length correctly, since most of the
features are overlapping and/or useless.  If while tuning MERT finds a
peak in the development set error surface that is offset from the test
set error surface (as a function of the parameters) then the effects
would be unpredictable.  It is always advisable to check the BLEU
length penalty calculation to make sure that something like this isn't
going on.

In either case, you should of course follow the advice of Jonathan
Clark, just to ensure that what you're looking at isn't an outlier.
http://aclweb.org/anthology-new/P/P11/P11-1042.pdf

Cheers
Adam

On Thu, Jul 7, 2011 at 3:47 AM, Andreas Kull <[email protected]> wrote:
> Hi,
>
> I have a 2k sentences tuning, 1k evaluation and a 70k  training corpus
> in the IT software domain and after tuning I get a slightly lower BLEU
> score but the reordering is way better and therefore the subjective
> translation quality is better.
>
> In this case I wouldn't recommend to use BLEU as a metric, but METEOR
> which gives me a more accurate quality measurement:
>
> http://www.cs.cmu.edu/~alavie/METEOR/examples.html
>
>
> Regards,
> Andreas
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

<<attachment: Bleu-vs-hypothesis-length.jpg>>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] does mert usually enhance BLEU on a test set?

Reply via email to