Per Tunedal, It's not a matter of compiling Moses with kenlm instead of irstlm. By default bjam compiles moses with kenlm. Sorry that I implied that you need to do something extra. Adding --with-irstlm adds IRSTLM functionality on top of kenlm. It doesn't hurt to include both in the compile.
If you build your language model with SRILM or IRSTLM, you need to convert their output to KenLM format. SRILM creates ARPA. IRSTLM creates iARPA files that must be converted to ARPA files using their "compile-lm --text yes" utility. Then, you convert the ARPA lm file to the KenLM binary format. Finally, you need to configure your moses.ini file to read the binarized KenLM file. Your moses.ini file can use LM code 8 or 9 depending on what performance you're looking for. You can find instructions for these last two steps here: http://www.statmt.org/moses/?n=Moses.Optimize#ntoc14 I'm not familiar with the lmplz command. Is that the new KenLM tool to build language models? If so, then following the instructions above are probably obsolete. After doing the above, our command line to run mert-moses.pl looks like this: /usr/bin/perl -w /usr/local/bin/mert-moses.pl \ --config /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/run0.moses.ini \ --decoder /usr/local/bin/moses \ --decoder-flags "-v 0 -threads 2" \ --input /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.nl \ --maximum-iterations 25 \ --mertdir /usr/local/bin \ --nbest 100 \ --no-filter-phrase-table \ --refs /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.en \ --threads 2 \ --working-dir /opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3 Notes: 1. This line does not use nohup, but it could. 2. We use the --no-filter-phrase-table option because we always binarize the phrase/reordering tables and configure the moses.ini file to use them. 4. The "--threads 2" option (next to last line) does not affect the operation of the moses binary. It tells the mert binary to run in multi-threaded mode. I think both support the "all" value. 5. In your command line below, it's better to use an absolute/resolved path instead of the ~ . Good luck. Tom On 04/30/2013 02:04 PM, Per Tunedal wrote: > Hi, > very interesting indeed. After compiling with KenLM, instead of IRSTLM: > What should the tuning command look like? > > I ran the following (using IRSTLM): > > nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl > ~/corpora/Total1.sv-fr.clean.slutet_urval.sv > ~/corpora/Total1.sv-fr.clean.slutet_urval.fr \ > ~/mosesdecoder/bin/moses train/model/moses.ini > --decoder-flags="-threads 4" -filtercmd > '/home/per/mosesdecoder/scripts/training/filter-model-given-input.pl > -Binarizer "~/mosesdecoder/bin/processPhraseTable"' --mertdir > ~/mosesdecoder/bin/ &> mert.out & > > Should I just add --threads after mer-moses.pl ? > > Further "compile moses to use KenLM and configure the SMT model to use > KenLM": > > 1) compile moses to use KenLM: "KenLM is compiled by default." Should I > just remove the flag --with-irstlm=<root dir of the IRSTLM toolkit> ? > And add 8 <factor> <size> filename.arpa to moses.ini? > > 2) looking at http://kheafield.com/code/kenlm/ I suppose I can build a > KenLM 3-gram language model by: > bin/lmplz -o 3 -S 80% -T /tmp <text >text.arpa > Is there any more to it? > > Yours, > Per Tunedal > > > On Mon, Apr 29, 2013, at 17:49, Tom Hoar wrote: >> When you said "it didn't work," what do you mean? How many cores were on >> the tuning machine? You should also run mert-moses.pl with the --threads >> option so the mert binary runs multithreaded. That's in addition to the >> --decoder-flags "-threads all" option Ken mentioned, which tells the >> moses binary to run multithreaded. >> >> You also have to compile moses to use KenLM and configure the SMT model >> to use KenLM, not IRSTLM. IRSTLM is still single threaded. Most of the >> tuning time is moses creating the translations. Moses will run single >> threaded when configured IRSTLM. >> >> Tom >> >> >> On 04/29/2013 10:33 PM, Arezki Sadoune wrote: >>> Dear All, >>> >>> I'm currently working on a Phrase-based model from french to english. >>> Assuming that the bitext corpora is very large, is there any way to >>> use the multi-thread for the tuning purpose? >>> >>> I've already tried by the past to tune a similar system but it has >>> taken 30 days on a single core. >>> >>> I've actually tried multithreaded tuning by adding the option -threads >>> 16 to the mert script parameter ( >>> /mosesdecoder/scripts/training/mert-moses.pl >>> home/Moses/mosesdecoder/tunning1/tunning.true.fr >>> /home/Moses/mosesdecoder/tunning1/tunning.true.en >>> /home/Moses/mosesdecoder/bin/moses -threads 16 ...) >>> >>> but it didn't work. >>> >>> Thanks a lot >>> >>> Az >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
