Thank you, Tom! The command bin/lmplz -o 3 -S 80% -T /tmp <text >text.arpa is just an adaptation of what I found at: http://kheafield.com/code/kenlm/estimation/
I suppose it's better to create the LM with KenLM from scratch, rather than converting an IRSTLM model. Yours, Per Tunedal On Tue, Apr 30, 2013, at 12:10, Tom Hoar wrote: > Per Tunedal, > > It's not a matter of compiling Moses with kenlm instead of irstlm. By > default bjam compiles moses with kenlm. Sorry that I implied that you > need to do something extra. Adding --with-irstlm adds IRSTLM > functionality on top of kenlm. It doesn't hurt to include both in the > compile. > > If you build your language model with SRILM or IRSTLM, you need to > convert their output to KenLM format. SRILM creates ARPA. IRSTLM creates > iARPA files that must be converted to ARPA files using their "compile-lm > --text yes" utility. Then, you convert the ARPA lm file to the KenLM > binary format. Finally, you need to configure your moses.ini file to > read the binarized KenLM file. Your moses.ini file can use LM code 8 or > 9 depending on what performance you're looking for. You can find > instructions for these last two steps here: > > http://www.statmt.org/moses/?n=Moses.Optimize#ntoc14 > > I'm not familiar with the lmplz command. Is that the new KenLM tool to > build language models? If so, then following the instructions above are > probably obsolete. > > After doing the above, our command line to run mert-moses.pl looks like > this: > > /usr/bin/perl -w /usr/local/bin/mert-moses.pl \ > --config > /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/run0.moses.ini > > \ > --decoder /usr/local/bin/moses \ > --decoder-flags "-v 0 -threads 2" \ > --input > /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.nl > > \ > --maximum-iterations 25 \ > --mertdir /usr/local/bin \ > --nbest 100 \ > --no-filter-phrase-table \ > --refs > /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.en > > \ > --threads 2 \ > --working-dir > /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3 > > Notes: > 1. This line does not use nohup, but it could. > 2. We use the --no-filter-phrase-table option because we always binarize > the phrase/reordering tables and configure the moses.ini file to use > them. > 4. The "--threads 2" option (next to last line) does not affect the > operation of the moses binary. It tells the mert binary to run in > multi-threaded mode. I think both support the "all" value. > 5. In your command line below, it's better to use an absolute/resolved > path instead of the ~ . > > Good luck. > Tom > > > > On 04/30/2013 02:04 PM, Per Tunedal wrote: > > Hi, > > very interesting indeed. After compiling with KenLM, instead of IRSTLM: > > What should the tuning command look like? > > > > I ran the following (using IRSTLM): > > > > nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl > > ~/corpora/Total1.sv-fr.clean.slutet_urval.sv > > ~/corpora/Total1.sv-fr.clean.slutet_urval.fr \ > > ~/mosesdecoder/bin/moses train/model/moses.ini > > --decoder-flags="-threads 4" -filtercmd > > '/home/per/mosesdecoder/scripts/training/filter-model-given-input.pl > > -Binarizer "~/mosesdecoder/bin/processPhraseTable"' --mertdir > > ~/mosesdecoder/bin/ &> mert.out & > > > > Should I just add --threads after mer-moses.pl ? > > > > Further "compile moses to use KenLM and configure the SMT model to use > > KenLM": > > > > 1) compile moses to use KenLM: "KenLM is compiled by default." Should I > > just remove the flag --with-irstlm=<root dir of the IRSTLM toolkit> ? > > And add 8 <factor> <size> filename.arpa to moses.ini? > > > > 2) looking at http://kheafield.com/code/kenlm/ I suppose I can build a > > KenLM 3-gram language model by: > > bin/lmplz -o 3 -S 80% -T /tmp <text >text.arpa > > Is there any more to it? > > > > Yours, > > Per Tunedal > > > > > > On Mon, Apr 29, 2013, at 17:49, Tom Hoar wrote: > >> When you said "it didn't work," what do you mean? How many cores were on > >> the tuning machine? You should also run mert-moses.pl with the --threads > >> option so the mert binary runs multithreaded. That's in addition to the > >> --decoder-flags "-threads all" option Ken mentioned, which tells the > >> moses binary to run multithreaded. > >> > >> You also have to compile moses to use KenLM and configure the SMT model > >> to use KenLM, not IRSTLM. IRSTLM is still single threaded. Most of the > >> tuning time is moses creating the translations. Moses will run single > >> threaded when configured IRSTLM. > >> > >> Tom > >> > >> > >> On 04/29/2013 10:33 PM, Arezki Sadoune wrote: > >>> Dear All, > >>> > >>> I'm currently working on a Phrase-based model from french to english. > >>> Assuming that the bitext corpora is very large, is there any way to > >>> use the multi-thread for the tuning purpose? > >>> > >>> I've already tried by the past to tune a similar system but it has > >>> taken 30 days on a single core. > >>> > >>> I've actually tried multithreaded tuning by adding the option -threads > >>> 16 to the mert script parameter ( > >>> /mosesdecoder/scripts/training/mert-moses.pl > >>> home/Moses/mosesdecoder/tunning1/tunning.true.fr > >>> /home/Moses/mosesdecoder/tunning1/tunning.true.en > >>> /home/Moses/mosesdecoder/bin/moses -threads 16 ...) > >>> > >>> but it didn't work. > >>> > >>> Thanks a lot > >>> > >>> Az > >>> > >>> > >>> > >>> _______________________________________________ > >>> Moses-support mailing list > >>> [email protected] > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
