Re: [Moses-support] Too large language models - how to handle that?

Raj Dabre Mon, 24 Nov 2014 04:03:07 -0800

Hey Hoang,
You should binarize the arpa file.
The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how.
Regards.


On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]>
wrote:

> Hi all,
> I have trained an (unpruned) 5-grams language model on a large corpus of 5
> billion words, resulting an ARPA-format file of roughly 300GB (is it a
> normal LM size with such a big monolingual data?). This is obviously too
> big for running an SMT system.
> I read several works where their system uses language models trained on
> similar monolingual corpus. Could you give me some advice how to handle
> this, making it feasible to run SMT systems?
> I appreciate your help a lot,
> Best,
> --
>
> *Best Regards,Hoang CuongSMTNerd*
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Too large language models - how to handle that?

Reply via email to