Per Tunedal,

It's not a matter of compiling Moses with kenlm instead of irstlm. By 
default bjam compiles moses with kenlm. Sorry that I implied that you 
need to do something extra. Adding --with-irstlm adds IRSTLM 
functionality on top of kenlm. It doesn't hurt to include both in the 
compile.

If you build your language model with SRILM or IRSTLM, you need to 
convert their output to KenLM format. SRILM creates ARPA. IRSTLM creates 
iARPA files that must be converted to ARPA files using their "compile-lm 
--text yes" utility. Then, you convert the ARPA lm file to the KenLM 
binary format. Finally, you need to configure your moses.ini file to 
read the binarized KenLM file. Your moses.ini file can use LM code 8 or 
9 depending on what performance you're looking for. You can find 
instructions for these last two steps here:

http://www.statmt.org/moses/?n=Moses.Optimize#ntoc14

I'm not familiar with the lmplz command. Is that the new KenLM tool to 
build language models? If so, then following the instructions above are 
probably obsolete.

After doing the above, our command line to run mert-moses.pl looks like 
this:

/usr/bin/perl -w /usr/local/bin/mert-moses.pl \
    --config 
/opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/run0.moses.ini
 
\
    --decoder /usr/local/bin/moses \
    --decoder-flags "-v 0 -threads 2" \
    --input 
/opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.nl
 
\
    --maximum-iterations 25 \
    --mertdir /usr/local/bin \
    --nbest 100 \
    --no-filter-phrase-table \
    --refs 
/opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3/mert1.en
 
\
    --threads 2 \
    --working-dir 
/opt/domy/TRAININGS/merts/mert-mert1-s=nl-t=en-p=domt_tm-a=giza-g=3-l=domt_lm-T=irstlmken-n=3

Notes:
1. This line does not use nohup, but it could.
2. We use the --no-filter-phrase-table option because we always binarize 
the phrase/reordering tables and configure the moses.ini file to use them.
4. The "--threads 2" option (next to last line) does not affect the 
operation of the moses binary. It tells the mert binary to run in 
multi-threaded mode. I think both support the "all" value.
5. In your command line below, it's better to use an absolute/resolved 
path instead of the ~ .

Good luck.
Tom



On 04/30/2013 02:04 PM, Per Tunedal wrote:
> Hi,
> very interesting indeed. After compiling with KenLM, instead of IRSTLM:
> What should the tuning command look like?
>
> I ran the following (using IRSTLM):
>
> nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl
> ~/corpora/Total1.sv-fr.clean.slutet_urval.sv
> ~/corpora/Total1.sv-fr.clean.slutet_urval.fr  \
>    ~/mosesdecoder/bin/moses  train/model/moses.ini
>    --decoder-flags="-threads 4" -filtercmd
>    '/home/per/mosesdecoder/scripts/training/filter-model-given-input.pl
>    -Binarizer "~/mosesdecoder/bin/processPhraseTable"' --mertdir
>    ~/mosesdecoder/bin/ &> mert.out &
>
> Should I just add --threads after mer-moses.pl ?
>
> Further "compile moses to use KenLM and configure the SMT model to use
> KenLM":
>
> 1) compile moses to use KenLM: "KenLM is compiled by default." Should I
> just remove the flag --with-irstlm=<root dir of the IRSTLM toolkit> ?
> And add  8 <factor> <size> filename.arpa to moses.ini?
>
> 2) looking at http://kheafield.com/code/kenlm/ I suppose I can build a
> KenLM 3-gram language model by:
> bin/lmplz -o 3 -S 80% -T /tmp <text >text.arpa
> Is there any more to it?
>
> Yours,
> Per Tunedal
>
>
> On Mon, Apr 29, 2013, at 17:49, Tom Hoar wrote:
>> When you said "it didn't work," what do you mean? How many cores were on
>> the tuning machine? You should also run mert-moses.pl with the --threads
>> option so the mert binary runs multithreaded. That's in addition to the
>> --decoder-flags "-threads all" option Ken mentioned, which tells the
>> moses binary to run multithreaded.
>>
>> You also have to compile moses to use KenLM and configure the SMT model
>> to use KenLM, not IRSTLM. IRSTLM is still single threaded. Most of the
>> tuning time is moses creating the translations. Moses will run single
>> threaded when configured IRSTLM.
>>
>> Tom
>>
>>
>> On 04/29/2013 10:33 PM, Arezki Sadoune wrote:
>>> Dear All,
>>>
>>> I'm currently working on a Phrase-based model from french to english.
>>> Assuming that the bitext corpora is very large, is there any way to
>>> use the multi-thread for the tuning purpose?
>>>
>>> I've already tried by the past to tune a similar system but it has
>>> taken 30 days on a single core.
>>>
>>> I've actually tried multithreaded tuning by adding the option -threads
>>> 16 to the mert script parameter (
>>> /mosesdecoder/scripts/training/mert-moses.pl
>>>   home/Moses/mosesdecoder/tunning1/tunning.true.fr
>>> /home/Moses/mosesdecoder/tunning1/tunning.true.en
>>> /home/Moses/mosesdecoder/bin/moses -threads 16 ...)
>>>
>>> but it didn't work.
>>>
>>> Thanks a lot
>>>
>>> Az
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to