Please welcome a new parameter tuner to Moses: k-best batch MIRA. This is hope-fear MIRA built as a drop-in replacement for MERT; it conducts online training using aggregated k-best lists as an approximation to the decoder's true search space. This allows it to handle large features, and it often out-performs MERT once feature counts get above 10. The new code has been pushed into the master branch, as well as into miramerge.
You can tune using this system by adding [--batch-mira] to your mert-moses.pl command. This replaces the normal call to the mert executable with a call to kbmira. I recommend also adding the flag [--return-best-dev] to mert-moses.pl. This will copy the moses.ini file corresponding to the highest-scoring development run (as determined by the evaluator executable using BLEU on run*.out) into the final moses.ini. This can make a fairly big difference for MIRA's test-time accuracy. You can also pass through options to kbmira by adding [--batch-mira-args 'whatever'] to mert-moses.pl. Useful kbmira options include: [-J n] : changes the number of inner MIRA loops to n passes over the data. Increasing this value to 100 or 300 can be good for working with small development sets. The default, 60, is ideal for development sets with more than 1000 sentences. [-C n] : changes MIRA's C-value to n. This controls regularization. The default, 0.01, works well for most situations, but if it looks like MIRA is over-fitting or not converging, decreasing C to 0.001 or 0.0001 can sometimes help. [--streaming] : stream k-best lists from disk rather than load them into memory. This results in very slow training, but may be necessary in low-memory environments or with very large development sets. Run kbmira --help for a full list of options. So, a complete call might look like this: $MOSES_SCRIPTS/training/mert-moses.pl work/dev.fr work/dev.en $MOSES_BIN/moses work/model/moses.ini --mertdir $MOSES_BIN --rootdir $MOSES_SCRIPTS --batch-mira --return-best-dev --batch-mira-args '-J 300' --decoder-flags '-threads 8 -v 0' Please give it a try. If it's not working as advertised, send me an e-mail and I'll see what I can do. For more information on batch MIRA, or to cite us, check out our paper: Colin Cherry and George Foster Batch Tuning Strategies for Statistical Machine Translation NAACL, June 2012 https://sites.google.com/site/colinacherry/Cherry_Foster_NAACL_2012.pdf Anticipating some questions: Q: Does it only handle BLEU? A: Yes, for now. There's nothing stopping people from implementing other metrics, so long as a reasonable sentence-level version of the metric can be worked out. Note that you generally need to retune kbmira's C-value for different metrics. I'd also change --return-best-dev to use the new metric as well. Q: Have you tested this on a cluster? A: No, I don't have access to a Sun Grid cluster - I would love it if someone would test that scenario for me. But it works just fine using multi-threaded decoding. Since training happens in a batch, decoding is embarrassingly parallel. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
