good to see the variance reduction.

why not repeat this with more features?  you should see a greater effect
this way.  an easy way to do this is to just add more language models.

Miles

On 11 August 2011 19:53, Philipp Koehn <[email protected]> wrote:

> Hi,
>
> I added a number of improvements to MERT that have been recently
> proposed in the literature, with the aim to support more features and
> greater stability.
>
> The improvements are:
> (1) Optimization in random directions [Cer et al., 2008]
> (2) Re-use of best weight settings from last n iterations as starting
> points [Foster and Kuhn, 2009]
> (3) Pairwise-Ranked Optimization [Hopkins and May, 2011]
>
> To give some more details:
>
> (1) Traditional MERT optimizes each parameter in isolation, finding
> the best gain for any parameter, applying it, and repeating this process
> until convergence. With the switch "-number-of-random-directions NUM",
> in addition to these directions of exploring the multi-dimensional
> weight space, a specified number of random directions are also explored.
>
> (2) In each iteration of the running the decoder to produce n-best lists
> and and optimizing weights, the first starting point is the last best
> weight
> setting found. 20 additional starting points are randomly generated.
> With the switch "-historic-best", the best found weights of each prior
> iterations are used as starting points in addition to the random starting
> points.
>
> (3) A recent paper proposed an alternative to MERT that trains a classifier
> to predict which of two candidates in the n-best list is better. Candidates
> are randomly sampled (with a bias towards candidates with large metric
> score differences) and passed to a standards linear model classifier
> (maximum entropy, support vector machines, etc.). The current Moses
> implementation uses MegaM by Hal Daume (check for license terms).
> This alternative to traditional MERT can be used with the switch
> "-pairwise-ranked".
>
> Notes:
>
> * the indicated switch are either specified when calling mert-moses.pl
>  or in the parameter "tuning-settings" in EMS.
>
> * option (3) is incompatible with (1) and (2), but the latter can be used
> together.
>
> * for "-number-of-random-directions" I used 50 random directions, which
>  slows down MERT quite a bit.
>
> * option (3) does not converge under the current Moses stopping criteria,
>  so it runs for 25 iterations, but you may want to reduce this to 10 with
>  the additional switch "-max-iterations 10"
>
> Some results:
> Urdu-English, SAMT Model
>
>  MERT setting iterations tuning set test set baseline 11.6 (std 4.8) 22.73
> (std 0.07) 21.54 (std 0.38) 50 random directions 9.4 (std 2.3) 22.82 (std
> 0.14) *21.58* (std 0.38) rand.dir. + historic best 9.2 (std 5.9) 22.79
> (std 0.23) 21.40 (std 0.37) pairwise-ranked max-iter 10 10 - 21.33 *(std
> 0.13)*
>
> Urdu-English, Hierarchical Model
>
>  MERT setting iterations tuning set test set baseline 8.8 (std 2.2) 23.91
> (std 0.18) *23.02* (std 0.42) 50 random directions 8.4 (std 3.3) 23.85
> (std 0.35) 22.80 (std 0.70) rand.dir. + historic best 12.0 (std 3.5) 24.03
> (std 0.23) 22.89 *(std 0.18)* pairwise-ranked max-iter 10 10 - 21.93 (std
> 0.36)
>
> German-English, Phrase-based
>
>  MERT setting iterations tuning set test set baseline 7.2 (std 14.3) 24.82
> (std 0.04) *21.29* (std 0.05) rand.dir. + historic best 6.6 (std 1.8)24.88 
> (std 0.07)21.28 (std 0.16)pairwise-ranked max-iter 1010-
> *21.29 (std 0.02)*
>
> German-English, Factored Backoff
>
>  MERT setting iterations tuning set test set baseline 12.0 (std 15.2)24.89 
> (std 0.25)21.35 (std 0.15)rand.dir. + historic best11.4 (std 7.6)25.01 (std 
> 0.12)21.45 (std 0.12)pairwise-ranked25-
> *21.58 (std 0.11)* pairwise-ranked max-iter 10 10 - 21.54 (std 0.10)
>
> Results are reported over 5 runs of each optimization method, in terms of
> average and standard deviation. What we are looking for is high test set
> scores and low variance.
>
> The Urdu-English systems use a smaller tuning set of less than a 1000
> sentences
>  (with 4 references), so I would tend to give it less faith. Test set for
> German-English
> is WMT 2011.
>
> Your milage may vary, but it is worth a tryout.
>
> -phi
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to