Hi Miles/Josh, thanks for your replies. Looking at the options, the first one seems to be the easiest one to try first:-
> --specify a lower order model (eg 4 rather than 5, or even 3); depending > upon how much monolingual training material you have, this may not produce > worse results and it will certainly run faster and will require less space. I take it that it won't be a problem stop what it's doing, and run this command in the same terminal with order 4. So, I'll proceed with Josh's suggestion: nohup nice ngram-count -order 4 -interpolate -kndiscount -text europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm &> ngram-run.out & Many thanks, Llio On Thu, Aug 14, 2008 at 12:29 PM, Miles Osborne <[EMAIL PROTECTED]> wrote: > (my message bounced as it was too long ... here is a truncated version) > > Miles > > ---------- Forwarded message ---------- > From: Miles Osborne <[EMAIL PROTECTED]> > Date: 2008/8/14 > Subject: Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model > and Train Model > To: Llio Humphreys <[EMAIL PROTECTED]> > Cc: moses-support <[email protected]> > > > building language models (using for example ngram-count) is computationally > expensive. from what you tell the list, it seems that you don't have enough > physical memory to run it properly. > > you have a number of options: > > --specify a lower order model (eg 4 rather than 5, or even 3); depending > upon how much monolingual training material you have, this may not produce > worse results and it will certainly run faster and will require less space. > > --divide your language model training material into chunks and run > ngram-count on each chunk. this is one strategy for building LMs using all > of the Giga word corpus (when you don't have access to a 64 bit machine). > here you would create multiple LMs. > > --use a disk-based method of creating them. we have done this, and > basically it trades speed for time. > > --take the radical option and simply don't bother smoothing at all (ie use > Google's "stupid backoff"). this makes training LMs trivial --just compute > the counts of ngrams and work-out how to store them. i reckon it should be > possible to do this and create an ARPA file suitable for loading into the > SRILM. > > --buy more machines. > > Miles > > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
