Hieu,

I'm running something like:
$ ...mosesdecoder/scripts/training/mert-moses.pl ...tuning.f
...tuning.e --no-filter-phrase-table --decoder-flags="-threads 32"
--nbest=100 ...mosesdecoder/bin/moses ...moses.ini --mertdir
...mosesdecoder/bin/ --rootdir ...mosesdecoder/scripts --working-dir
...tuning &> ...mert.out &

moses.ini looks like this:

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

[distortion-limit]
0

# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryCompact name=TranslationModel0 num-features=4
path=...phrase-table.minphr input-factor=0 output-factor=0
Distortion
KENLM lazyken=0 name=LM0 factor=0 path=...lm.blm.lm order=5

# dense weights for feature functions
[weight]
UnknownWordPenalty0= 0
WordPenalty0= 0
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
Distortion0= 0
LM0= 0.5

Happy to share my data, but not sure how. My language model is 6+GB in
binary form.

Bogdan


On Mon, Aug 1, 2016 at 12:55 PM, Hieu Hoang <[email protected]> wrote:
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 1 August 2016 at 20:40, Bogdan Vasilescu <[email protected]> wrote:
>>
>> Thanks Hieu,
>>
>> It runs out of memory around 3,000 sentences when n-best is the
>> default 100. It seems to do a little bit better if I set n-best to 10
>> (5,000 sentences or so). The machine I'm running this on has 192 GB
>> RAM. I'm using the binary moses from
>> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/
>>
>> My phrase table was built on 1,200,000 sentences (phrase length at
>> most 20). My language model is a 5-gram, built on close to 500,000,000
>> sentences.
>
> i can't why is would run out of memory. If you can make you model avaiable
> for download and tell me the exact command you ran, maybe I can try to
> replicate it
>>
>>
>> Still, the question remains. Is there a way to perform tuning
>> incrementally?
>
> i think what you proposed is doable. I don't know whether it would improve
> over the baseline
>>
>>
>> I'm thinking:
>> - tune on a sample of my original tuning corpora; this generates an
>> updated moses.ini, with "better" weights
>> - use this moses.ini as input for a second tuning phase, on another
>> sample of my tuning corpora
>> - repeat until there is convergence in the weights
>>
>> Bogdan
>>
>>
>> On Mon, Aug 1, 2016 at 11:43 AM, Hieu Hoang <[email protected]> wrote:
>> >
>> >
>> > Hieu Hoang
>> > http://www.hoang.co.uk/hieu
>> >
>> > On 29 July 2016 at 18:57, Bogdan Vasilescu <[email protected]> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I've trained a model and I'm trying to tune it using mert-moses.pl.
>> >>
>> >> I tried different size tuning corpora, and as soon as I exceed a
>> >> certain size (this seems to vary between consecutive runs, as well as
>> >> with other tuning parameters like --nbest), the process gets killed:
>> >
>> > it should work with any size tuning corpora. The only thin I can think
>> > of is
>> > if the tuning corpora is very large (1,000,000 sentences say) or the
>> > n-best
>> > list is very large (1,000,000 say) then the decoder or the mert script
>> > may
>> > use a lot of memory
>> >>
>> >>
>> >> Killed
>> >> Exit code: 137
>> >> The decoder died. CONFIG WAS -weight-overwrite ...
>> >>
>> >> Looking into the kernel logs in /var/log/kern.log suggests I'm running
>> >> out of memory:
>> >>
>> >> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score
>> >> 992 or sacrifice child
>> >> kernel: [98464.080920] Killed process 15848 (moses)
>> >> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB
>> >>
>> >> Is there a way to perform tuning incrementally?
>> >>
>> >> I'm thinking:
>> >> - tune on a sample of my original tuning corpora; this generates an
>> >> updated moses.ini, with "better" weights
>> >> - use this moses.ini as input for a second tuning phase, on another
>> >> sample of my tuning corpora
>> >> - repeat until there is convergence in the weights
>> >>
>> >> Would this work?
>> >>
>> >> Many thanks in advance,
>> >> Bogdan
>> >>
>> >> --
>> >> Bogdan (博格丹) Vasilescu
>> >> Postdoctoral Researcher
>> >> Davis Eclectic Computational Analytics Lab
>> >> University of California, Davis
>> >> http://bvasiles.github.io
>> >> http://decallab.cs.ucdavis.edu/
>> >> @b_vasilescu
>> >>
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> [email protected]
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>>
>>
>>
>> --
>> Bogdan (博格丹) Vasilescu
>> Postdoctoral Researcher
>> Davis Eclectic Computational Analytics Lab
>> University of California, Davis
>> http://bvasiles.github.io
>> http://decallab.cs.ucdavis.edu/
>> @b_vasilescu
>
>



-- 
Bogdan (博格丹) Vasilescu
Postdoctoral Researcher
Davis Eclectic Computational Analytics Lab
University of California, Davis
http://bvasiles.github.io
http://decallab.cs.ucdavis.edu/
@b_vasilescu

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to