good to see the variance reduction. why not repeat this with more features? you should see a greater effect this way. an easy way to do this is to just add more language models.
Miles On 11 August 2011 19:53, Philipp Koehn <[email protected]> wrote: > Hi, > > I added a number of improvements to MERT that have been recently > proposed in the literature, with the aim to support more features and > greater stability. > > The improvements are: > (1) Optimization in random directions [Cer et al., 2008] > (2) Re-use of best weight settings from last n iterations as starting > points [Foster and Kuhn, 2009] > (3) Pairwise-Ranked Optimization [Hopkins and May, 2011] > > To give some more details: > > (1) Traditional MERT optimizes each parameter in isolation, finding > the best gain for any parameter, applying it, and repeating this process > until convergence. With the switch "-number-of-random-directions NUM", > in addition to these directions of exploring the multi-dimensional > weight space, a specified number of random directions are also explored. > > (2) In each iteration of the running the decoder to produce n-best lists > and and optimizing weights, the first starting point is the last best > weight > setting found. 20 additional starting points are randomly generated. > With the switch "-historic-best", the best found weights of each prior > iterations are used as starting points in addition to the random starting > points. > > (3) A recent paper proposed an alternative to MERT that trains a classifier > to predict which of two candidates in the n-best list is better. Candidates > are randomly sampled (with a bias towards candidates with large metric > score differences) and passed to a standards linear model classifier > (maximum entropy, support vector machines, etc.). The current Moses > implementation uses MegaM by Hal Daume (check for license terms). > This alternative to traditional MERT can be used with the switch > "-pairwise-ranked". > > Notes: > > * the indicated switch are either specified when calling mert-moses.pl > or in the parameter "tuning-settings" in EMS. > > * option (3) is incompatible with (1) and (2), but the latter can be used > together. > > * for "-number-of-random-directions" I used 50 random directions, which > slows down MERT quite a bit. > > * option (3) does not converge under the current Moses stopping criteria, > so it runs for 25 iterations, but you may want to reduce this to 10 with > the additional switch "-max-iterations 10" > > Some results: > Urdu-English, SAMT Model > > MERT setting iterations tuning set test set baseline 11.6 (std 4.8) 22.73 > (std 0.07) 21.54 (std 0.38) 50 random directions 9.4 (std 2.3) 22.82 (std > 0.14) *21.58* (std 0.38) rand.dir. + historic best 9.2 (std 5.9) 22.79 > (std 0.23) 21.40 (std 0.37) pairwise-ranked max-iter 10 10 - 21.33 *(std > 0.13)* > > Urdu-English, Hierarchical Model > > MERT setting iterations tuning set test set baseline 8.8 (std 2.2) 23.91 > (std 0.18) *23.02* (std 0.42) 50 random directions 8.4 (std 3.3) 23.85 > (std 0.35) 22.80 (std 0.70) rand.dir. + historic best 12.0 (std 3.5) 24.03 > (std 0.23) 22.89 *(std 0.18)* pairwise-ranked max-iter 10 10 - 21.93 (std > 0.36) > > German-English, Phrase-based > > MERT setting iterations tuning set test set baseline 7.2 (std 14.3) 24.82 > (std 0.04) *21.29* (std 0.05) rand.dir. + historic best 6.6 (std 1.8)24.88 > (std 0.07)21.28 (std 0.16)pairwise-ranked max-iter 1010- > *21.29 (std 0.02)* > > German-English, Factored Backoff > > MERT setting iterations tuning set test set baseline 12.0 (std 15.2)24.89 > (std 0.25)21.35 (std 0.15)rand.dir. + historic best11.4 (std 7.6)25.01 (std > 0.12)21.45 (std 0.12)pairwise-ranked25- > *21.58 (std 0.11)* pairwise-ranked max-iter 10 10 - 21.54 (std 0.10) > > Results are reported over 5 runs of each optimization method, in terms of > average and standard deviation. What we are looking for is high test set > scores and low variance. > > The Urdu-English systems use a smaller tuning set of less than a 1000 > sentences > (with 4 references), so I would tend to give it less faith. Test set for > German-English > is WMT 2011. > > Your milage may vary, but it is worth a tryout. > > -phi > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
