Jia, Yes, mert's purpose is to optimize the configuration weights such that BLEU scores increase.
I had a similar case where mert didn't change the BLEU scores. Our troubleshooting found the tuning set wasn't prepared the same as the training data... i.e. we forgot to lower-case and tokenize the tuning set. This is probably a good place for you to start. Tom On Mon, 21 Feb 2011 09:35:41 +1100, Suzy Howlett <[email protected]> wrote: > Hi Jia, > > It could very well be that the training data isn't very good. Tuning > changes how much each feature is weighted, but if the estimates of > the > feature values aren't reasonable in the first place, I can't imagine > it > helps too much. Perhaps you're not using enough training data, or the > training data is just too different from your test data (e.g. genre)? > Someone with more experience than me may be able to give you more > advice. > > Best, > Suzy > > On 21/02/11 2:46 AM, Jia Xu wrote: >> Hi, >> >> In my experiments, tuning with mert-moses.pl or mert-moses-new.pl on >> a development set did not improve the translation quality on a test >> set, about half percent worse in the BLEU score (no tuning vs. >> tuning). Does anyone have a similar experience or did I call anything >> wrong? >> >> nbest=100 >> dev: wmt-test08 >> test: wmt-test10 >> with/without tuning is achieved by turning off/on weight-config in >> the config file. >> >> Thank you! >> Best Wishes, >> Jia >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
