Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model and Train Model

Llio Humphreys Thu, 14 Aug 2008 05:00:04 -0700

Hi Miles/Josh,
thanks for your replies.  Looking at the options, the first one seems
to be the easiest one to try first:-


> --specify a lower order model (eg 4 rather than 5, or even 3);  depending
> upon how much monolingual training material you have, this may not produce
> worse results  and it will certainly run faster and will require less space.

I take it that it won't be a problem stop what it's doing, and run
this command in the same terminal with order 4.
So, I'll proceed with Josh's suggestion:

nohup nice ngram-count -order 4 -interpolate -kndiscount -text
europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm &>
ngram-run.out &

Many thanks,
Llio



On Thu, Aug 14, 2008 at 12:29 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> (my message bounced as it was too long ... here is a truncated  version)
>
> Miles
>
> ---------- Forwarded message ----------
> From: Miles Osborne <[EMAIL PROTECTED]>
> Date: 2008/8/14
> Subject: Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model
> and Train Model
> To: Llio Humphreys <[EMAIL PROTECTED]>
> Cc: moses-support <[email protected]>
>
>
> building language models (using for example ngram-count) is computationally
> expensive.  from what you tell the list, it seems that you don't have enough
> physical memory to run it properly.
>
> you have a number of options:
>
> --specify a lower order model (eg 4 rather than 5, or even 3);  depending
> upon how much monolingual training material you have, this may not produce
> worse results  and it will certainly run faster and will require less space.
>
> --divide your language model training material into chunks and run
> ngram-count on each chunk.  this is one strategy for building LMs using all
> of the Giga word corpus (when you don't have access to a 64 bit machine).
> here you would create multiple LMs.
>
> --use a disk-based method of creating them.  we have done this, and
> basically it trades speed for time.
>
> --take the radical option and simply don't bother smoothing at all (ie use
> Google's "stupid backoff").  this makes training LMs trivial --just compute
> the counts of ngrams and work-out how to store them.  i reckon it should be
> possible to do this and create an ARPA file suitable for loading into the
> SRILM.
>
> --buy more machines.
>
> Miles
>
>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model and Train Model

Reply via email to