Hey Hoang, You should binarize the arpa file. The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how. Regards.
On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]> wrote: > Hi all, > I have trained an (unpruned) 5-grams language model on a large corpus of 5 > billion words, resulting an ARPA-format file of roughly 300GB (is it a > normal LM size with such a big monolingual data?). This is obviously too > big for running an SMT system. > I read several works where their system uses language models trained on > similar monolingual corpus. Could you give me some advice how to handle > this, making it feasible to run SMT systems? > I appreciate your help a lot, > Best, > -- > > *Best Regards,Hoang CuongSMTNerd* > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Raj Dabre. Research Student, Graduate School of Informatics, Kyoto University. CSE MTech, IITB., 2011-2014
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
