Hi all, I have trained an (unpruned) 5-grams language model on a large corpus of 5 billion words, resulting an ARPA-format file of roughly 300GB (is it a normal LM size with such a big monolingual data?). This is obviously too big for running an SMT system. I read several works where their system uses language models trained on similar monolingual corpus. Could you give me some advice how to handle this, making it feasible to run SMT systems? I appreciate your help a lot, Best, --
*Best Regards,Hoang CuongSMTNerd*
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
