Hello,

I'm joining this discussion because I've been having similar problems
during the last few days, translating from Swedish to Danish. I trained
a number of one-factor models, for which MERT tuning worked nicely
and yielded very noticeable performance improvements. When I started
using factored models, things suddenly got worse.

At first, my MERT results were really bad because I was using the wrong
reference corpus. The input corpus needs to contain as many factors as
the translation model requires, but there must only be one factor in the
reference corpus (the word form), so you can't use the same variant of
the two corpora for input and reference with a factored model. Maybe you
find this obvious (and in a way it is), but it took me some time to find out.

After fixing this error, the dramatic performance drop from the first
experiments went away, but still it's not as good as it could be. MERT
optimisation now sometimes improves the scores, but if it does, the
improvement is only around half a percent BLEU, whereas it used to gain
several percent points in the earlier experiments. Sometimes, it slightly
degrades performance instead.

There shouldn't be a corpus problem, as I've been using the same
training, devtest and test corpora for both the more and the less
successful experiments. The devtest corpus contains 1000 sentences.
Is there any particular reason why MERT should perform worse with
factored models? I thought about whether the number of parameters to be
optimised might have an effect, but in fact in one of the models I'm
having problems with, there is only one additional weight (a POS
language model), so we're not talking about parameter explosion.

Best,
Christian
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to