ngram-count is outputting an LM file specified by the -lm argument. "working-dir/lm/europarl.lm" in your case.
I think it counts all ngrams first and then writes the file once at the end, so you probably didn't corrupt the output by accidentally starting a new process. If you want it to train quicker/don't have enough memory, try an order of 4 or even 3. Higher order LM models take more time to calculate and more RAM to hold in memory. The "-l 0:5:working-dir/lm/europarl.lm:0" arg to train-factored-phrase-model includes the LM order, so change that 5 to the appropriate number when you run that step. You mentioned having trouble getting stderr from train-factored-phrase- model in another email, and it seems like ngram-count is making your system unresponsive. Do a web search and learn about the unix 'nohup' and 'nice' commands, as well as redirecting stderr and stdout to a file, and running processes in the background. You'll end up with something like this, which might not thrash your system as much, and won't require that you leave a terminal window open the whole time a process runs: nohup nice ngram-count -order 4 -interpolate -kndiscount -text europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm &> ngram- run.out & Someone familiar with the Ubuntu packages will have to answer whether the moses installation is added to the path, how to call the training scripts, and if the moses/scripts directory is made & released. -Josh On 14 Aug 2008, at 12:02, Llio Humphreys wrote: > Dear Murat, Anung, Hieu, Josh, Eric, Miles, Sara, Amittai, > thank you all for your help. It is very, very much appreciated. I > decided to try Eric's packages, and it looks like the installation > worked. I typed some of the > commands in the Baseline instructions without arguments, and the > program either output to the screen that I missed some arguments or > gave a description of the program. Thank you Eric!!! > > Following the Baseline instructions > (http://www.statmt.org/wmt08/baseline.html) I have now got to the > following step: > > Use SRILM to build language model: > /path-to-srilm/bin/i686/ngram-count -order 5 -interpolate -kndiscount > -text working-dir/lm/europarl.lowercased -lm > working-dir/lm/europarl.lm > > In my case, I was in folder home/llio/MOSESMTDATA. I didn't know the > path to ngram-count, but it was possible to invoke it without the > path: > > ngram-count -order 5 -interpolate -kndiscount -text > europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm > > I'm concerned about two things: > 1) this ngram-count step is taking a very long time. I think I > started > it off around 6pm yesterday, but it's still going. It's very > resource-intensive, and it's difficult to get to other windows open. > I went to check up on it around 9pm, and couldn't find that particular > terminal. I thought I had closed that terminal by mistake, so I > stupidly > opened another one, and entered the same command. I subsequently > found that the original terminal was still open, so I closed the > second one. I'm not sure if issuing this command a second time on the > same program and files on a different terminal would corrupt the > original ngramcount step, and whether I should start it off again, or > whether starting it off again would make things worse? I looked up > ngram-count > (http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html > ) > and I don't think it outputs to any file, so I guess you have to be in > the same terminal to do the next step? I opened > another terminal and typed 'top' to see what processes are running, > and I know that ngram-count is doing something, but whether it's doing > well or stuck in a loop, I can't say. What I do find strange is that > the time for ngram-count is said to be 00:58:20, and it's been going > for hours.. I searched this problem in previous Moses Group emails and > I understand that if I run this with order 4 instead of 5 it will run > quicker with very similar results? So, can I just stop what it's > doing, and run this command in the same terminal with order 4? Are > there any files I need to 'touch' to ensure that it doesn't leave any > stone unturned? > > 2) how to do the next step: > > bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored- > phrase-model.perl > -scripts-root-dir bin/moses-scripts/scripts-YYYYMMDD-HHMM -root-dir > working-dir -corpus working-dir/corpus/europarl.lowercased -f fr -e en > -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm > 0:5:working-dir/lm/europarl.lm:0 > > I assume that like ngram-count, I can just type in > train-factored-phrase-model.perl without the full path...Do I need to > set the -scripts-root-dir paramater? Are all the scripts in the same > place? > > Thank you, > > Llio -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
