did you try to use a binary version of the LM?
there is a tool 'compile-lm' in the irstlm package.

good luck!

jorg


Felipe Sánchez Martínez wrote:
> Hello all,
> 
> I am training the SMT baseline system using the data provided at
> http://www.statmt.org/wmt09/translation-task.html on a 16 GB of RAM
> Linux server. 
> 
> To train the language model I am using the corpora found at 
> http://www.statmt.org/wmt09/training-monolingual.tar More precisely, I
> am using the concatenation of the files europarl-v4.en.gz file and
> news-train08.en.gz. Corpus is around 550 million words. 
> 
> The command line used to train the language model is:
> 
> srilm-1.5.7/bin/x86_64/ngram-count -order 5 -interpolate -kndiscount
> -text corpus.lowercased -lm corpus.lm
> 
> It goes out of memory (16 GB!!) and starts using swap.
> 
> Is this normal? How could I deal with it without using a smaller corpus?
> 
> Someone knows why news-train08.en.gz is much larger than the rest of
> news-train08 files?
> 
> Thanks in advance for you valuable help.
> 
> Regards
> 


-- 

Jörg


***********/\/\/\/\/\/\/\/\/\/\/\************************************
**  Jörg Tiedemann                 [EMAIL PROTECTED]              **
**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **
**  Rijksuniversiteit Groningen    Harmoniegebouw, room 1311-429   **
**  Postbus 716                    phone: +31 (0)50-363 5935       **
**  9700 AS Groningen              fax:   +31 (0)50-363 6855       **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to