Hi Jörg In each MERT iteration, the first action is to decode the tuning set and create an n-best list, using the current weight set. The 1-bests from this decoding run are the hypotheses which get scored by --return-best-dev.
After that decoding, MERT searchs for a weight set that can rerank the n-best lists to give a better BLEU, and stops when it reaches a local maximum. This is the BLEU that is reported in the moses.ini file. So it is a BLEU obtained by decoding with one weight set, and then reranking with a different weight set. When you redecode using the new weight set you do not get the same set of translations, since the nbest list is just a tiny sample of the hypotheses that are considered during decoding, so there will normally be hypotheses outwith the nbest list which have higher model score. We haven't generally used --return-best-dev with MERT - does it help? It's really designed for pro and kbmira. cheers - Barry On 06/03/14 11:28, Jorg Tiedemann wrote: > Hi, > > I have a question about the --return-best-dev flag in mert-moses.pl > I have run several experiments using this flag and I don't really > understand how it influences the choice of settings during MERT. In > many cases, the system will select an early iteration which is much > below in terms of BLEU than many iterations later. Maybe my confusing > is related to the BLEU score mentioned in the moses.ini files printed > after each iteration? Can someone help me? Thanks! > > > Cheers, > Jörg > > > Jörg Tiedemann > [email protected] <mailto:[email protected]> > > > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
