MERT is trying to solve a convex optimisation task. However, it never sees the full landscape: instead, it only sees a fraction of it, and that fraction changes between each round. Since that fraction is a function of the initial conditions, if MERT was randomised to begin with, it could end-up at some local maxima.
Making sure that MERT started with the same randomisation would perhaps make it deterministic (in the sense that subsequent runs --of the same system-- should produce the same results). This however is really a hack, since you have no way of knowing that the initial settings (and hence the final outcome) are actually any good. The correct approach in this case would be to run it from multiple random starting points and either average, or else take the best setting. The better approach would be not to use MERT but instead to do proper convex optimisation. I leave this as an exercise to the reader ... Miles On 21/01/2008, Daniel Déchelotte <[EMAIL PROTECTED]> wrote: > > Miles Osborne a écrit : > > > Chris is correct --MERT has no guarantees that it will produce the > > same results between runs (even when starting from the same training > > conditions). This is in part because MERT does not find the global > > optimum (remember it is not considering the full space of possible > > translations, but rather uses n-best lists). > > What part of MERT is not deterministic ? Hopefully, there is a way to > make it so (by explicitly initialising the random seed to some known > value, for example). I would feel safer with a fully deterministic > procedure :-] (the first run may be "unpredictable", but any re-run > provides the exact same result). In my experience, this is already the > case. Could it be system-specific, then? > > Thanks, > -- Daniel > > > However, you can reuse weights between runs for development > > experiments if you are just changing a single feature function. You > > may not get the best possible results, but your experiments should be > > in the right area. Naturally, you will eventually need to re-run MERT > > to 'sync' your model. > > > > Miles > > > > On 21/01/2008, Daniel Déchelotte <[EMAIL PROTECTED]> wrote: > > > Chris Dyer a écrit : > > > > menor bangget a écrit : > > > > > > > > > 2. If I train the same corpus twice, using 2 different word > > > > > alignment, e.g., union and grow-diag-final, will I get different > > > > > weight after running mert-moses.pl; or it will be the same > > > > > because I used exactly the same corpus? > > > > > > > > MERT is a non-deterministic algorithm and so you'll see different > > > > weights from run to run, even with the exact same alignment > > > > heuristics. > > > > > > AFAIK, mert picks some points at random indeed, but it picks the > > > exact same points from run to run (on the same data). In other > > > words, rerunning it on the same models (same data + same training > > > sequence) will provide the same results. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
