Hieu, I'm running something like: $ ...mosesdecoder/scripts/training/mert-moses.pl ...tuning.f ...tuning.e --no-filter-phrase-table --decoder-flags="-threads 32" --nbest=100 ...mosesdecoder/bin/moses ...moses.ini --mertdir ...mosesdecoder/bin/ --rootdir ...mosesdecoder/scripts --working-dir ...tuning &> ...mert.out &
moses.ini looks like this: # input factors [input-factors] 0 # mapping steps [mapping] 0 T 0 [distortion-limit] 0 # feature functions [feature] UnknownWordPenalty WordPenalty PhrasePenalty PhraseDictionaryCompact name=TranslationModel0 num-features=4 path=...phrase-table.minphr input-factor=0 output-factor=0 Distortion KENLM lazyken=0 name=LM0 factor=0 path=...lm.blm.lm order=5 # dense weights for feature functions [weight] UnknownWordPenalty0= 0 WordPenalty0= 0 PhrasePenalty0= 0.2 TranslationModel0= 0.2 0.2 0.2 0.2 Distortion0= 0 LM0= 0.5 Happy to share my data, but not sure how. My language model is 6+GB in binary form. Bogdan On Mon, Aug 1, 2016 at 12:55 PM, Hieu Hoang <[email protected]> wrote: > > > Hieu Hoang > http://www.hoang.co.uk/hieu > > On 1 August 2016 at 20:40, Bogdan Vasilescu <[email protected]> wrote: >> >> Thanks Hieu, >> >> It runs out of memory around 3,000 sentences when n-best is the >> default 100. It seems to do a little bit better if I set n-best to 10 >> (5,000 sentences or so). The machine I'm running this on has 192 GB >> RAM. I'm using the binary moses from >> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/ >> >> My phrase table was built on 1,200,000 sentences (phrase length at >> most 20). My language model is a 5-gram, built on close to 500,000,000 >> sentences. > > i can't why is would run out of memory. If you can make you model avaiable > for download and tell me the exact command you ran, maybe I can try to > replicate it >> >> >> Still, the question remains. Is there a way to perform tuning >> incrementally? > > i think what you proposed is doable. I don't know whether it would improve > over the baseline >> >> >> I'm thinking: >> - tune on a sample of my original tuning corpora; this generates an >> updated moses.ini, with "better" weights >> - use this moses.ini as input for a second tuning phase, on another >> sample of my tuning corpora >> - repeat until there is convergence in the weights >> >> Bogdan >> >> >> On Mon, Aug 1, 2016 at 11:43 AM, Hieu Hoang <[email protected]> wrote: >> > >> > >> > Hieu Hoang >> > http://www.hoang.co.uk/hieu >> > >> > On 29 July 2016 at 18:57, Bogdan Vasilescu <[email protected]> wrote: >> >> >> >> Hi, >> >> >> >> I've trained a model and I'm trying to tune it using mert-moses.pl. >> >> >> >> I tried different size tuning corpora, and as soon as I exceed a >> >> certain size (this seems to vary between consecutive runs, as well as >> >> with other tuning parameters like --nbest), the process gets killed: >> > >> > it should work with any size tuning corpora. The only thin I can think >> > of is >> > if the tuning corpora is very large (1,000,000 sentences say) or the >> > n-best >> > list is very large (1,000,000 say) then the decoder or the mert script >> > may >> > use a lot of memory >> >> >> >> >> >> Killed >> >> Exit code: 137 >> >> The decoder died. CONFIG WAS -weight-overwrite ... >> >> >> >> Looking into the kernel logs in /var/log/kern.log suggests I'm running >> >> out of memory: >> >> >> >> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score >> >> 992 or sacrifice child >> >> kernel: [98464.080920] Killed process 15848 (moses) >> >> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB >> >> >> >> Is there a way to perform tuning incrementally? >> >> >> >> I'm thinking: >> >> - tune on a sample of my original tuning corpora; this generates an >> >> updated moses.ini, with "better" weights >> >> - use this moses.ini as input for a second tuning phase, on another >> >> sample of my tuning corpora >> >> - repeat until there is convergence in the weights >> >> >> >> Would this work? >> >> >> >> Many thanks in advance, >> >> Bogdan >> >> >> >> -- >> >> Bogdan (博格丹) Vasilescu >> >> Postdoctoral Researcher >> >> Davis Eclectic Computational Analytics Lab >> >> University of California, Davis >> >> http://bvasiles.github.io >> >> http://decallab.cs.ucdavis.edu/ >> >> @b_vasilescu >> >> >> >> _______________________________________________ >> >> Moses-support mailing list >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> >> >> >> -- >> Bogdan (博格丹) Vasilescu >> Postdoctoral Researcher >> Davis Eclectic Computational Analytics Lab >> University of California, Davis >> http://bvasiles.github.io >> http://decallab.cs.ucdavis.edu/ >> @b_vasilescu > > -- Bogdan (博格丹) Vasilescu Postdoctoral Researcher Davis Eclectic Computational Analytics Lab University of California, Davis http://bvasiles.github.io http://decallab.cs.ucdavis.edu/ @b_vasilescu _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
