Hey Hoang,
You should binarize the arpa file.
The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how.
Regards.

On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]>
wrote:

> Hi all,
> I have trained an (unpruned) 5-grams language model on a large corpus of 5
> billion words, resulting an ARPA-format file of roughly 300GB (is it a
> normal LM size with such a big monolingual data?). This is obviously too
> big for running an SMT system.
> I read several works where their system uses language models trained on
> similar monolingual corpus. Could you give me some advice how to handle
> this, making it feasible to run SMT systems?
> I appreciate your help a lot,
> Best,
> --
>
> *Best Regards,Hoang CuongSMTNerd*
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to