Dear Josh, Ok, I ran the command, with order 3 as I'm just testing if this system works on this machine, and it produced europarl.lm in seconds and output: [1] 15789
n-gram.out said: nohup: ignoring input nohup.out said: Warning: ngram-count option "-text" needs an argument one of required modified KneserNey count-of-counts is zero error in discount estimator for order 1 But I've looked at europarl.lm and it looks fine to me, it even ends with \end\ so it obviously finished the process. I guess if there's anything wrong, I'll find out in the next step? Llio On Thu, Aug 14, 2008 at 12:35 PM, Josh Schroeder <[EMAIL PROTECTED]> wrote: > ngram-count is outputting an LM file specified by the -lm argument. > "working-dir/lm/europarl.lm" in your case. > > I think it counts all ngrams first and then writes the file once at the end, > so you probably didn't corrupt the output by accidentally starting a new > process. > > If you want it to train quicker/don't have enough memory, try an order of 4 > or even 3. Higher order LM models take more time to calculate and more RAM > to hold in memory. The "-l 0:5:working-dir/lm/europarl.lm:0" arg to > train-factored-phrase-model includes the LM order, so change that 5 to the > appropriate number when you run that step. > > You mentioned having trouble getting stderr from train-factored-phrase-model > in another email, and it seems like ngram-count is making your system > unresponsive. Do a web search and learn about the unix 'nohup' and 'nice' > commands, as well as redirecting stderr and stdout to a file, and running > processes in the background. You'll end up with something like this, which > might not thrash your system as much, and won't require that you leave a > terminal window open the whole time a process runs: > > nohup nice ngram-count -order 4 -interpolate -kndiscount -text > europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm &> ngram-run.out > & > > Someone familiar with the Ubuntu packages will have to answer whether the > moses installation is added to the path, how to call the training scripts, > and if the moses/scripts directory is made & released. > > -Josh > > On 14 Aug 2008, at 12:02, Llio Humphreys wrote: > >> Dear Murat, Anung, Hieu, Josh, Eric, Miles, Sara, Amittai, >> thank you all for your help. It is very, very much appreciated. I >> decided to try Eric's packages, and it looks like the installation >> worked. I typed some of the >> commands in the Baseline instructions without arguments, and the >> program either output to the screen that I missed some arguments or >> gave a description of the program. Thank you Eric!!! >> >> Following the Baseline instructions >> (http://www.statmt.org/wmt08/baseline.html) I have now got to the >> following step: >> >> Use SRILM to build language model: >> /path-to-srilm/bin/i686/ngram-count -order 5 -interpolate -kndiscount >> -text working-dir/lm/europarl.lowercased -lm >> working-dir/lm/europarl.lm >> >> In my case, I was in folder home/llio/MOSESMTDATA. I didn't know the >> path to ngram-count, but it was possible to invoke it without the >> path: >> >> ngram-count -order 5 -interpolate -kndiscount -text >> europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm >> >> I'm concerned about two things: >> 1) this ngram-count step is taking a very long time. I think I started >> it off around 6pm yesterday, but it's still going. It's very >> resource-intensive, and it's difficult to get to other windows open. >> I went to check up on it around 9pm, and couldn't find that particular >> terminal. I thought I had closed that terminal by mistake, so I stupidly >> opened another one, and entered the same command. I subsequently >> found that the original terminal was still open, so I closed the >> second one. I'm not sure if issuing this command a second time on the >> same program and files on a different terminal would corrupt the >> original ngramcount step, and whether I should start it off again, or >> whether starting it off again would make things worse? I looked up >> ngram-count >> (http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html) >> and I don't think it outputs to any file, so I guess you have to be in >> the same terminal to do the next step? I opened >> another terminal and typed 'top' to see what processes are running, >> and I know that ngram-count is doing something, but whether it's doing >> well or stuck in a loop, I can't say. What I do find strange is that >> the time for ngram-count is said to be 00:58:20, and it's been going >> for hours.. I searched this problem in previous Moses Group emails and >> I understand that if I run this with order 4 instead of 5 it will run >> quicker with very similar results? So, can I just stop what it's >> doing, and run this command in the same terminal with order 4? Are >> there any files I need to 'touch' to ensure that it doesn't leave any >> stone unturned? >> >> 2) how to do the next step: >> >> >> bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored-phrase-model.perl >> -scripts-root-dir bin/moses-scripts/scripts-YYYYMMDD-HHMM -root-dir >> working-dir -corpus working-dir/corpus/europarl.lowercased -f fr -e en >> -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm >> 0:5:working-dir/lm/europarl.lm:0 >> >> I assume that like ngram-count, I can just type in >> train-factored-phrase-model.perl without the full path...Do I need to >> set the -scripts-root-dir paramater? Are all the scripts in the same >> place? >> >> Thank you, >> >> Llio > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
