Hello everyone,

 

         Recently we’re working on building a huge language model. The
corpus size is about 3G, and our computer memory is 40G.

         We failed to build a 5-gram language model with SRILM because of
insufficient memory.

         We divided the corpus into 2 parts and trained language model on
both of them separately, however ,this still failed.

         I would like to know, how huge the corpus could approximately be
when training a 5-gram language model from SRILM?

         As my colleague reported, it cost 9G memory when the corpus size is
150M to train a 5-gram language model. Is this normal? 

         We are now trying to use IRSTLM. Is there any suggestions ?

 

 

----------------------------------------------------

Best wishes!

Xianhua Li 

 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to