Re: [Moses-support] Multi-threaded Tuning with mert

Per Tunedal Tue, 30 Apr 2013 05:41:39 -0700

Thank you, Tom!
The command bin/lmplz -o 3 -S 80% -T /tmp <text >text.arpa is just an
adaptation of what I found at:
http://kheafield.com/code/kenlm/estimation/


I suppose it's better to create the LM with KenLM from scratch, rather
than converting an IRSTLM model.
Yours,
Per Tunedal

On Tue, Apr 30, 2013, at 12:10, Tom Hoar wrote:
> Per Tunedal,
> 
> It's not a matter of compiling Moses with kenlm instead of irstlm. By 
> default bjam compiles moses with kenlm. Sorry that I implied that you 
> need to do something extra. Adding --with-irstlm adds IRSTLM 
> functionality on top of kenlm. It doesn't hurt to include both in the 
> compile.
> 
> If you build your language model with SRILM or IRSTLM, you need to 
> convert their output to KenLM format. SRILM creates ARPA. IRSTLM creates 
> iARPA files that must be converted to ARPA files using their "compile-lm 
> --text yes" utility. Then, you convert the ARPA lm file to the KenLM 
> binary format. Finally, you need to configure your moses.ini file to 
> read the binarized KenLM file. Your moses.ini file can use LM code 8 or 
> 9 depending on what performance you're looking for. You can find 
> instructions for these last two steps here:
> 
> http://www.statmt.org/moses/?n=Moses.Optimize#ntoc14
> 
> I'm not familiar with the lmplz command. Is that the new KenLM tool to 
> build language models? If so, then following the instructions above are 
> probably obsolete.
> 
> After doing the above, our command line to run mert-moses.pl looks like 
> this:
> 
> /usr/bin/perl -w /usr/local/bin/mert-moses.pl \
>     --config 
> /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/run0.moses.ini
>  
> \
>     --decoder /usr/local/bin/moses \
>     --decoder-flags "-v 0 -threads 2" \
>     --input 
> /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.nl
>  
> \
>     --maximum-iterations 25 \
>     --mertdir /usr/local/bin \
>     --nbest 100 \
>     --no-filter-phrase-table \
>     --refs 
> /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.en
>  
> \
>     --threads 2 \
>     --working-dir 
> /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3
> 
> Notes:
> 1. This line does not use nohup, but it could.
> 2. We use the --no-filter-phrase-table option because we always binarize 
> the phrase/reordering tables and configure the moses.ini file to use
> them.
> 4. The "--threads 2" option (next to last line) does not affect the 
> operation of the moses binary. It tells the mert binary to run in 
> multi-threaded mode. I think both support the "all" value.
> 5. In your command line below, it's better to use an absolute/resolved 
> path instead of the ~ .
> 
> Good luck.
> Tom
> 
> 
> 
> On 04/30/2013 02:04 PM, Per Tunedal wrote:
> > Hi,
> > very interesting indeed. After compiling with KenLM, instead of IRSTLM:
> > What should the tuning command look like?
> >
> > I ran the following (using IRSTLM):
> >
> > nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl
> > ~/corpora/Total1.sv-fr.clean.slutet_urval.sv
> > ~/corpora/Total1.sv-fr.clean.slutet_urval.fr  \
> >    ~/mosesdecoder/bin/moses  train/model/moses.ini
> >    --decoder-flags="-threads 4" -filtercmd
> >    '/home/per/mosesdecoder/scripts/training/filter-model-given-input.pl
> >    -Binarizer "~/mosesdecoder/bin/processPhraseTable"' --mertdir
> >    ~/mosesdecoder/bin/ &> mert.out &
> >
> > Should I just add --threads after mer-moses.pl ?
> >
> > Further "compile moses to use KenLM and configure the SMT model to use
> > KenLM":
> >
> > 1) compile moses to use KenLM: "KenLM is compiled by default." Should I
> > just remove the flag --with-irstlm=<root dir of the IRSTLM toolkit> ?
> > And add  8 <factor> <size> filename.arpa to moses.ini?
> >
> > 2) looking at http://kheafield.com/code/kenlm/ I suppose I can build a
> > KenLM 3-gram language model by:
> > bin/lmplz -o 3 -S 80% -T /tmp <text >text.arpa
> > Is there any more to it?
> >
> > Yours,
> > Per Tunedal
> >
> >
> > On Mon, Apr 29, 2013, at 17:49, Tom Hoar wrote:
> >> When you said "it didn't work," what do you mean? How many cores were on
> >> the tuning machine? You should also run mert-moses.pl with the --threads
> >> option so the mert binary runs multithreaded. That's in addition to the
> >> --decoder-flags "-threads all" option Ken mentioned, which tells the
> >> moses binary to run multithreaded.
> >>
> >> You also have to compile moses to use KenLM and configure the SMT model
> >> to use KenLM, not IRSTLM. IRSTLM is still single threaded. Most of the
> >> tuning time is moses creating the translations. Moses will run single
> >> threaded when configured IRSTLM.
> >>
> >> Tom
> >>
> >>
> >> On 04/29/2013 10:33 PM, Arezki Sadoune wrote:
> >>> Dear All,
> >>>
> >>> I'm currently working on a Phrase-based model from french to english.
> >>> Assuming that the bitext corpora is very large, is there any way to
> >>> use the multi-thread for the tuning purpose?
> >>>
> >>> I've already tried by the past to tune a similar system but it has
> >>> taken 30 days on a single core.
> >>>
> >>> I've actually tried multithreaded tuning by adding the option -threads
> >>> 16 to the mert script parameter (
> >>> /mosesdecoder/scripts/training/mert-moses.pl
> >>>   home/Moses/mosesdecoder/tunning1/tunning.true.fr
> >>> /home/Moses/mosesdecoder/tunning1/tunning.true.en
> >>> /home/Moses/mosesdecoder/bin/moses -threads 16 ...)
> >>>
> >>> but it didn't work.
> >>>
> >>> Thanks a lot
> >>>
> >>> Az
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> [email protected]
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Multi-threaded Tuning with mert

Reply via email to