Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model and Train Model

Josh Schroeder Thu, 14 Aug 2008 04:36:34 -0700

ngram-count is outputting an LM file specified by the -lm argument.  
"working-dir/lm/europarl.lm" in your case.


I think it counts all ngrams first and then writes the file once at  
the end, so you probably didn't corrupt the output by accidentally  
starting a new process.

If you want it to train quicker/don't have enough memory, try an order  
of 4 or even 3. Higher order LM models take more time to calculate and  
more RAM to hold in memory. The  "-l 0:5:working-dir/lm/europarl.lm:0"  
arg to train-factored-phrase-model includes the LM order, so change  
that 5 to the appropriate number when you run that step.

You mentioned having trouble getting stderr from train-factored-phrase- 
model in another email, and it seems like ngram-count is making your  
system unresponsive. Do a web search and learn about the unix 'nohup'  
and 'nice' commands, as well as redirecting stderr and stdout to a  
file, and running processes in the background. You'll end up with  
something like this, which might not thrash your system as much, and  
won't require that you leave a terminal window open the whole time a  
process runs:

nohup nice ngram-count -order 4 -interpolate -kndiscount -text  
europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm &> ngram- 
run.out &

Someone familiar with the Ubuntu packages will have to answer whether  
the moses installation is added to the path, how to call the training  
scripts, and if the moses/scripts directory is made & released.

-Josh

On 14 Aug 2008, at 12:02, Llio Humphreys wrote:

> Dear Murat, Anung, Hieu, Josh, Eric, Miles, Sara, Amittai,
> thank you all for your help.  It is very, very much appreciated. I
> decided to try Eric's packages, and it looks like the installation
> worked.  I typed some of the
> commands in the Baseline instructions without arguments, and the
> program either output to the screen that I missed some arguments or
> gave a description of the program.  Thank you Eric!!!
>
> Following the Baseline instructions
> (http://www.statmt.org/wmt08/baseline.html) I have now got to the
> following step:
>
> Use SRILM to build language model:
> /path-to-srilm/bin/i686/ngram-count -order 5 -interpolate -kndiscount
> -text working-dir/lm/europarl.lowercased -lm
> working-dir/lm/europarl.lm
>
> In my case, I was in folder home/llio/MOSESMTDATA.  I didn't know the
> path to ngram-count, but it was possible to invoke it without the
> path:
>
> ngram-count -order 5 -interpolate -kndiscount -text
> europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm
>
> I'm concerned about two things:
> 1) this ngram-count step is taking a very long time.  I think I  
> started
> it off around 6pm yesterday, but it's still going.  It's very
> resource-intensive, and it's difficult to get to  other windows open.
> I went to check up on it around 9pm, and couldn't find that particular
> terminal.  I thought I had closed that terminal by mistake, so I  
> stupidly
> opened another one, and entered the same command.  I subsequently
> found that the original terminal was still open, so I closed the
> second one.  I'm not sure if issuing this command a second time on the
> same program and files on a different terminal would corrupt the
> original ngramcount step, and whether I should start it off again, or
> whether starting it off again would make things worse?   I looked up
> ngram-count 
> (http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html 
> )
> and I don't think it outputs to any file, so I guess you have to be in
> the same terminal to do the next step?  I opened
> another terminal and typed 'top' to see what processes are running,
> and I know that ngram-count is doing something, but whether it's doing
> well or stuck in a loop, I can't say.  What I do find strange is that
> the time for ngram-count is said to be 00:58:20, and it's been going
> for hours.. I searched this problem in previous Moses Group emails and
> I understand that if I run this with order 4 instead of 5 it will run
> quicker with very similar results?  So, can I just stop what it's
> doing, and run this command in the same terminal with order 4?  Are
> there any files I need to 'touch' to ensure that it doesn't leave any
> stone unturned?
>
> 2) how to do the next step:
>
> bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored- 
> phrase-model.perl
> -scripts-root-dir bin/moses-scripts/scripts-YYYYMMDD-HHMM -root-dir
> working-dir -corpus working-dir/corpus/europarl.lowercased -f fr -e en
> -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:5:working-dir/lm/europarl.lm:0
>
> I assume that like ngram-count, I can just type in
> train-factored-phrase-model.perl without the full path...Do I need to
> set the -scripts-root-dir paramater?  Are all the scripts in the same
> place?
>
> Thank you,
>
> Llio


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model and Train Model

Reply via email to