Hello everyone,
Recently we’re working on building a huge language model. The
corpus size is about 3G, and our computer memory is 40G.
We failed to build a 5-gram language model with SRILM because of
insufficient memory.
We divided the corpus into 2 parts and trained language model on
both of them separately, however ,this still failed.
I would like to know, how huge the corpus could approximately be
when training a 5-gram language model from SRILM?
As my colleague reported, it cost 9G memory when the corpus size is
150M to train a 5-gram language model. Is this normal?
We are now trying to use IRSTLM. Is there any suggestions ?
----------------------------------------------------
Best wishes!
Xianhua Li
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support