Hi John, Thank you for your answer. That's really useful information, as I don't want to wait unnecessary long for the tuning to complete. I suspected that the corpus size for tuning wasn't dependent on the size of the training corpus. Now I've got an idea how large a size is reasonable. Yours, Per Tunedal
On Fri, Mar 15, 2013, at 16:19, John D. Burger wrote: > We did some experiments a long time ago on tuning set size (for Chinese > to English). For the standard Moses setup, there are only a dozen or so > meta-features to find weights for, so it's no surprise that improvements > asymptote sharply after the tuning set gets much bigger than 1-2000 > segment pairs. (To answer one of your questions, Per, the size of the > tuning set shouldn't have much, if anything, to do with the size of the > phrase training dataset.) > > Of course tuning algorithms like MIRA let you efficiently work with many > more meta-features - see Chiang et al. 2009: > > http://www.aclweb.org/anthology/N/N09/N09-1025.pdf > > In this case you'd expect to continue finding improvements with much > larger tuning sets. > > - John Burger > MITRE > > On Mar 15, 2013, at 11:07 , Barry Haddow wrote: > > > Hi Per > > > > We typically use tuning sets of 1000-3000 sentences, but recently have > > been experimenting with larger sets (10k) which can give slightly better > > results. It all depends if you care about that last 0.2 bleu. I don't > > think there's been any thorough investigation into tuning set size, or > > its relation with training set size. > > > > batch-mira works well, sometimes better than mert, but not quicker. The > > only reading is the Cherry and Foster paper, which contains a good > > overview of tuning methods. > > > > I should also mention this presentation on discriminative training > > http://www.statmt.org/mtm12/pub/discriminative-mt.pdf > > > > cheers - Barry > > > > On 15/03/13 12:10, Per Tunedal wrote: > >> Hi Barry, > >> I've already looked at that page, but it didn't answer my questions. > >> > >> The most pertinent questions are practical: > >> What's the recommended size of the tuning corpus? > >> Is that size independent of the size of the training corpus, or not? > >> > >> But, I'm interested in the theoretical aspects as well. > >> > >> I've looked into the mert-moses.pl script: > >> maximum-iterations=i : could be a short cut if I don't want to wait for > >> ever. Any advice on a wise limit for the iterations? > >> threads=i : sounds useful. But you say that I probably wont need it. > >> Why? > >> > >> Any experience of batch-mira? pros and cons? Any reading? > >> > >> Yours, > >> Per Tunedal > >> > >> On Fri, Mar 15, 2013, at 10:50, Barry Haddow wrote: > >>> Hi Per > >>> > >>> There's a lot of questions in this email. I'd strongly recommend that > >>> you have a look at this page > >>> http://www.statmt.org/moses/?n=FactoredTraining.Tuning and the > >>> references in it. But if you really want to understand tuning you need > >>> to read this book (http://www.statmt.org/book/) and particularly chapter > >>> 9. > >>> > >>> As to the memory/thread usage, Moses will use a single thread whilst > >>> loading models, then multiple threads in decoding. The mert binary > >>> (mert) shouldn't be resource heavy in the default setting. It has its > >>> own threads parameter, but you probably don't need it. > >>> > >>> Tuning stops when it no longer gets any improvement, typically 10-20 > >>> iterations, although there is an upper limit of 25 (configurable). > >>> > >>> cheers - Barry > >>> > >>> On 15/03/13 08:08, Per Tunedal wrote: > >>>> Hi again, > >>>> What does the tuning actually do? Tries to translate and checks against > >>>> the actual translation in the target language file? Trying different > >>>> weights, over and over again? No wonder it's time consuming. > >>>> > >>>> Tuning needs a lot of memory too, compared to training. At least in one > >>>> of the steps, according to the system monitor. The step that only uses > >>>> one thread, in spite of the parameter -threads. What step? And why? > >>>> > >>>> I see some interesting files are created, with names like > >>>> run8.best100.out . I suppose those are the most successful translations. > >>>> How are they used in the tuning? > >>>> > >>>> The default tuner (?) is mert, how does mert acually work to do the > >>>> tuning efficient? How are the weights to be tested chosen? Are there > >>>> any short cuts to take? > >>>> What's the difference to other tuners (?)? > >>>> > >>>> Anyone working on some different approach for tuning, to get improved > >>>> tuning speed or improved translation quality? > >>>> > >>>> What's the recommended size of the tuning corpus? Is that size > >>>> independent of the size of the training corpus? Is it dependent of the > >>>> tuner (?) used? > >>>> > >>>> Yours, > >>>> Per Tunedal > >>>> > >>>> PS My tuning has just started round 8, after 20 hours of processing. > >>>> Will it stop at 10 rounds, or what? > >>>> > >>>> > >>>> _______________________________________________ > >>>> Moses-support mailing list > >>>> [email protected] > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support > >>>> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
