Hi Raj, Tom and Marcin,
I binarized the ARPA file last night, following your suggestion. In the
end, it resulted a binarized LM file of roughly *100GB* (@Marcin - it is
not 20-30GB as you suggest, is it okay with this size?)
Fortunately, the infrastructure at my university allows me to run
experiments with that.
Thanks a lot for your help.
It is so great to play with such huge LMs :))
Best,


On Mon, Nov 24, 2014 at 3:19 PM, Marcin Junczys-Dowmunt <[email protected]>
wrote:

>  The command
>
> moses/bin/build_binary trie -a 22 -b 8 -q 8 lm.arpa lm.kenlm
>
> will build a compressed binarized model with quantization. You can run
>
> moses/bin/build_binary lm.arpa
>
> without any parameters to get size estimates for different parameter
> settings. I would guess you will get a binarized LM of roughly 20 to 30 GB
> which is managable (provided the size you gave us is that of an
> uncompressed text file). You can also use lmplz to build pruned models in
> the first place, these will be much smaller.
>
> W dniu 2014-11-24 15:11, Tom Hoar napisaƂ(a):
>
> After binarizing such a large ARPA file with KenLM, you'll need to
> configure your moses.ini file to "lazily load the model using mmap." This
> involves using lmodel-file code "9" vs code "8." More details here:
> https://kheafield.com/code/kenlm/moses/
>
> Performance improves significantly if you store the binarized file on an
> SSD.
>
>
>
>
> On 11/24/2014 07:00 PM, Raj Dabre wrote:
>
>   Hey Hoang,
> You should binarize the arpa file.
> The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how.
> Regards.
>
> On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]>
> wrote:
>
>> Hi all,
>> I have trained an (unpruned) 5-grams language model on a large corpus of
>> 5 billion words, resulting an ARPA-format file of roughly 300GB (is it a
>> normal LM size with such a big monolingual data?). This is obviously too
>> big for running an SMT system.
>> I read several works where their system uses language models trained on
>> similar monolingual corpus. Could you give me some advice how to handle
>> this, making it feasible to run SMT systems?
>> I appreciate your help a lot,
>> Best,
>>  --
>>  Best Regards,
>>  Hoang Cuong
>>  SMTNerd
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
>  Raj Dabre.
> Research Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 

*Best Regards,Hoang CuongSMTNerd*
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to