Re: [Moses-support] Too large language models - how to handle that?

Tom Hoar Mon, 24 Nov 2014 06:15:01 -0800

After binarizing such a large ARPA file with KenLM, you'll need toconfigure your moses.ini file to "lazily load the model using mmap."This involves using lmodel-file code "9" vs code "8." More details here:https://kheafield.com/code/kenlm/moses/

Performance improves significantly if you store the binarized file on anSSD.





On 11/24/2014 07:00 PM, Raj Dabre wrote:

Hey Hoang,
You should binarize the arpa file.
The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how.
Regards.

On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]<mailto:[email protected]>> wrote:


    Hi all,
    I have trained an (unpruned) 5-grams language model on a large
    corpus of 5 billion words, resulting an ARPA-format file of
    roughly 300GB (is it a normal LM size with such a big monolingual
    data?). This is obviously too big for running an SMT system.
    I read several works where their system uses language models
    trained on similar monolingual corpus. Could you give me some
    advice how to handle this, making it feasible to run SMT systems?
    I appreciate your help a lot,
    Best,

--/

    Best Regards,
    /
    Hoang Cuong
    /
    /
    SMTNerd
    /
    /

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support




--
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Too large language models - how to handle that?

Reply via email to