The JHU summer workshop final report had some experiments on this:
http://www.learningace.com/doc/3098660/be148017730f3f3a7b45d656276b482a/jhu-summer-workshop-final-report
(See Fig. 6.7 and surrounding)

In general:
1) MERT works on so few features that you don't need much dev data to learn them
2) Dev data selection is more important than dev data size (i.e.,
length, number of references). See, e.g., the thesis of Nitin Madnani
(2010) on the value of multiple references. This is especially true if
you're going to evaluate your system on a small test set.
3) The more features you have, the more data you need. This is a
serious limitation of most current discriminative training work, which
focus on adding new features without (usually) rethinking how dev sets
are used.



On Mon, Apr 22, 2013 at 9:56 AM, Sara Stymne <[email protected]> wrote:
> HI,
>
> Does anyone know of any published results which invesitage the effect of
> the size of the tuning data set. I'm primarily interested in relation to
> Mert, but other optimization methods would also be interesting,
>
> Best,
> Sara
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to