Could we add all these tricks of the trade to the Moses website, for example, 
at http://www.statmt.org/moses/?n=Moses.Optimize
(also for other topics?). I would really like that ...


Cheers,
Jörg


On Nov 25, 2014, at 12:45 PM, Holger Schwenk wrote:

> Hello,
> 
> another option is to perform data selection to only keep the data relevant to 
> yout task.
> Usually you improve your performance, and as a nice side effect, you LM is 
> much smaller ;-)
> 
> Many people use the algorithm proposed by Moore and Lewis, which is 
> implemented in the freely available tool XenC (on github)
> 
> best,
> 
> Holger
> 
> On 11/25/2014 12:02 PM, Hoang Cuong wrote:
>> Hi Raj, Tom and Marcin,
>> I binarized the ARPA file last night, following your suggestion. In the end, 
>> it resulted a binarized LM file of roughly 100GB (@Marcin - it is not 
>> 20-30GB as you suggest, is it okay with this size?)
>> Fortunately, the infrastructure at my university allows me to run 
>> experiments with that.
>> Thanks a lot for your help. 
>> It is so great to play with such huge LMs :))
>> Best,
>> 
>> 
>> On Mon, Nov 24, 2014 at 3:19 PM, Marcin Junczys-Dowmunt <[email protected]> 
>> wrote:
>> The command
>> 
>> moses/bin/build_binary trie -a 22 -b 8 -q 8 lm.arpa lm.kenlm
>> 
>> will build a compressed binarized model with quantization. You can run
>> 
>> moses/bin/build_binary lm.arpa
>> 
>> without any parameters to get size estimates for different parameter 
>> settings. I would guess you will get a binarized LM of roughly 20 to 30 GB 
>> which is managable (provided the size you gave us is that of an uncompressed 
>> text file). You can also use lmplz to build pruned models in the first 
>> place, these will be much smaller.
>> 
>> W dniu 2014-11-24 15:11, Tom Hoar napisał(a):
>> 
>>> After binarizing such a large ARPA file with KenLM, you'll need to 
>>> configure your moses.ini file to "lazily load the model using mmap." This 
>>> involves using lmodel-file code "9" vs code "8." More details here: 
>>> https://kheafield.com/code/kenlm/moses/
>>> 
>>> Performance improves significantly if you store the binarized file on an 
>>> SSD.
>>> 
>>> 
>>> 
>>> 
>>> On 11/24/2014 07:00 PM, Raj Dabre wrote:
>>>> Hey Hoang,
>>>> You should binarize the arpa file.
>>>> The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how.
>>>> Regards.
>>>> 
>>>> On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]> 
>>>> wrote:
>>>> Hi all,
>>>> I have trained an (unpruned) 5-grams language model on a large corpus of 5 
>>>> billion words, resulting an ARPA-format file of roughly 300GB (is it a 
>>>> normal LM size with such a big monolingual data?). This is obviously too 
>>>> big for running an SMT system.
>>>> I read several works where their system uses language models trained on 
>>>> similar monolingual corpus. Could you give me some advice how to handle 
>>>> this, making it feasible to run SMT systems?
>>>> I appreciate your help a lot,
>>>> Best,
>>>> -- 
>>>> Best Regards,
>>>> Hoang Cuong
>>>> SMTNerd
>>>> 
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Raj Dabre.
>>>> Research Student,
>>>> Graduate School of Informatics,
>>>> Kyoto University.
>>>> CSE MTech, IITB., 2011-2014
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> 
>>> 
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>  
>>  
>> 
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> 
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Hoang Cuong
>> SMTNerd 
>> 
>> 
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to