Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model and Train Model

Llio Humphreys Thu, 14 Aug 2008 05:41:43 -0700

Dear Josh,
Ok, I ran the command, with order 3 as I'm just testing if this system
works on this machine, and it produced europarl.lm in seconds and
output:
[1] 15789


n-gram.out said:
nohup: ignoring input

nohup.out said:
Warning: ngram-count option "-text" needs an argument
one of required modified KneserNey count-of-counts is zero
error in discount estimator for order 1

But I've looked at europarl.lm and it looks fine to me, it even ends
with \end\ so it obviously finished the process.

I guess if there's anything wrong, I'll find out in the next step?

Llio

On Thu, Aug 14, 2008 at 12:35 PM, Josh Schroeder <[EMAIL PROTECTED]> wrote:
> ngram-count is outputting an LM file specified by the -lm argument.
> "working-dir/lm/europarl.lm" in your case.
>
> I think it counts all ngrams first and then writes the file once at the end,
> so you probably didn't corrupt the output by accidentally starting a new
> process.
>
> If you want it to train quicker/don't have enough memory, try an order of 4
> or even 3. Higher order LM models take more time to calculate and more RAM
> to hold in memory. The  "-l 0:5:working-dir/lm/europarl.lm:0" arg to
> train-factored-phrase-model includes the LM order, so change that 5 to the
> appropriate number when you run that step.
>
> You mentioned having trouble getting stderr from train-factored-phrase-model
> in another email, and it seems like ngram-count is making your system
> unresponsive. Do a web search and learn about the unix 'nohup' and 'nice'
> commands, as well as redirecting stderr and stdout to a file, and running
> processes in the background. You'll end up with something like this, which
> might not thrash your system as much, and won't require that you leave a
> terminal window open the whole time a process runs:
>
> nohup nice ngram-count -order 4 -interpolate -kndiscount -text
> europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm &> ngram-run.out
> &
>
> Someone familiar with the Ubuntu packages will have to answer whether the
> moses installation is added to the path, how to call the training scripts,
> and if the moses/scripts directory is made & released.
>
> -Josh
>
> On 14 Aug 2008, at 12:02, Llio Humphreys wrote:
>
>> Dear Murat, Anung, Hieu, Josh, Eric, Miles, Sara, Amittai,
>> thank you all for your help.  It is very, very much appreciated. I
>> decided to try Eric's packages, and it looks like the installation
>> worked.  I typed some of the
>> commands in the Baseline instructions without arguments, and the
>> program either output to the screen that I missed some arguments or
>> gave a description of the program.  Thank you Eric!!!
>>
>> Following the Baseline instructions
>> (http://www.statmt.org/wmt08/baseline.html) I have now got to the
>> following step:
>>
>> Use SRILM to build language model:
>> /path-to-srilm/bin/i686/ngram-count -order 5 -interpolate -kndiscount
>> -text working-dir/lm/europarl.lowercased -lm
>> working-dir/lm/europarl.lm
>>
>> In my case, I was in folder home/llio/MOSESMTDATA.  I didn't know the
>> path to ngram-count, but it was possible to invoke it without the
>> path:
>>
>> ngram-count -order 5 -interpolate -kndiscount -text
>> europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm
>>
>> I'm concerned about two things:
>> 1) this ngram-count step is taking a very long time.  I think I started
>> it off around 6pm yesterday, but it's still going.  It's very
>> resource-intensive, and it's difficult to get to  other windows open.
>> I went to check up on it around 9pm, and couldn't find that particular
>> terminal.  I thought I had closed that terminal by mistake, so I stupidly
>> opened another one, and entered the same command.  I subsequently
>> found that the original terminal was still open, so I closed the
>> second one.  I'm not sure if issuing this command a second time on the
>> same program and files on a different terminal would corrupt the
>> original ngramcount step, and whether I should start it off again, or
>> whether starting it off again would make things worse?   I looked up
>> ngram-count
>> (http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html)
>> and I don't think it outputs to any file, so I guess you have to be in
>> the same terminal to do the next step?  I opened
>> another terminal and typed 'top' to see what processes are running,
>> and I know that ngram-count is doing something, but whether it's doing
>> well or stuck in a loop, I can't say.  What I do find strange is that
>> the time for ngram-count is said to be 00:58:20, and it's been going
>> for hours.. I searched this problem in previous Moses Group emails and
>> I understand that if I run this with order 4 instead of 5 it will run
>> quicker with very similar results?  So, can I just stop what it's
>> doing, and run this command in the same terminal with order 4?  Are
>> there any files I need to 'touch' to ensure that it doesn't leave any
>> stone unturned?
>>
>> 2) how to do the next step:
>>
>>
>> bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored-phrase-model.perl
>> -scripts-root-dir bin/moses-scripts/scripts-YYYYMMDD-HHMM -root-dir
>> working-dir -corpus working-dir/corpus/europarl.lowercased -f fr -e en
>> -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
>> 0:5:working-dir/lm/europarl.lm:0
>>
>> I assume that like ngram-count, I can just type in
>> train-factored-phrase-model.perl without the full path...Do I need to
>> set the -scripts-root-dir paramater?  Are all the scripts in the same
>> place?
>>
>> Thank you,
>>
>> Llio
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model and Train Model

Reply via email to