Dear Eric/Moses Support Group, I am using Ubuntu with 3.5GB RAM and finally got train-factored-phrase-model.perl to run! I am now on the tuning part of the tutorial, and I'm still using the Baseline data to test out the system on my machine. I adapted the command for tuning from:
bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/mert-moses.pl working-dir/tuning/input working-dir/tuning/reference moses/moses-cmd/src/moses working-dir/model/moses.ini --working-dir working-dir/tuning --rootdir bin/moses-scripts/scripts-YYYYMMDD-HHMM to mert-moses.pl europarl/tuning/input europarl/tuning/reference moses model/moses.ini --working-dir europarl/tuning --rootdir /usr/share/moses/scripts >&mert-moses-run.out I get the error message: After default: -l mem_free=0.5G -hard Using SCRIPTS_ROOTDIR: /usr/share/moses/scripts Not executable: moses at /usr/bin/mert-moses.pl line 297. mert-moses.pl line 297 is empty but the previous line says: die "Not executable: $___DECODER" if ! -x $___DECODER; Grateful for your advice. Thanks, Llio Humphreys On Thu, Aug 14, 2008 at 12:52 PM, Eric Nichols <[EMAIL PROTECTED]> wrote: > Greetings, > > In the moses package, I install everything into /usr/share/moses and > symlink the scripts and moses command into /usr/bin. > You can see a list of installed files by running the following command: > > # dpkg -L moses > > When you call a command like ngram-count or > train-factored-phrase-model.perl, you do not need to specify the full > path; > the system will be able to find it. I do not know if it is strictly > necessary to set -scripts-root-dir, but the value > /usr/share/moses/scripts works fine. > > Eric Nichols > > On Thu, Aug 14, 2008 at 8:02 PM, Llio Humphreys <[EMAIL PROTECTED]> wrote: >> Dear Murat, Anung, Hieu, Josh, Eric, Miles, Sara, Amittai, >> thank you all for your help. It is very, very much appreciated. I >> decided to try Eric's packages, and it looks like the installation >> worked. I typed some of the >> commands in the Baseline instructions without arguments, and the >> program either output to the screen that I missed some arguments or >> gave a description of the program. Thank you Eric!!! >> >> Following the Baseline instructions >> (http://www.statmt.org/wmt08/baseline.html) I have now got to the >> following step: >> >> Use SRILM to build language model: >> /path-to-srilm/bin/i686/ngram-count -order 5 -interpolate -kndiscount >> -text working-dir/lm/europarl.lowercased -lm >> working-dir/lm/europarl.lm >> >> In my case, I was in folder home/llio/MOSESMTDATA. I didn't know the >> path to ngram-count, but it was possible to invoke it without the >> path: >> >> ngram-count -order 5 -interpolate -kndiscount -text >> europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm >> >> I'm concerned about two things: >> 1) this ngram-count step is taking a very long time. I think I started >> it off around 6pm yesterday, but it's still going. It's very >> resource-intensive, and it's difficult to get to other windows open. >> I went to check up on it around 9pm, and couldn't find that particular >> terminal. I thought I had closed that terminal by mistake, so I stupidly >> opened another one, and entered the same command. I subsequently >> found that the original terminal was still open, so I closed the >> second one. I'm not sure if issuing this command a second time on the >> same program and files on a different terminal would corrupt the >> original ngramcount step, and whether I should start it off again, or >> whether starting it off again would make things worse? I looked up >> ngram-count >> (http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html) >> and I don't think it outputs to any file, so I guess you have to be in >> the same terminal to do the next step? I opened >> another terminal and typed 'top' to see what processes are running, >> and I know that ngram-count is doing something, but whether it's doing >> well or stuck in a loop, I can't say. What I do find strange is that >> the time for ngram-count is said to be 00:58:20, and it's been going >> for hours.. I searched this problem in previous Moses Group emails and >> I understand that if I run this with order 4 instead of 5 it will run >> quicker with very similar results? So, can I just stop what it's >> doing, and run this command in the same terminal with order 4? Are >> there any files I need to 'touch' to ensure that it doesn't leave any >> stone unturned? >> >> 2) how to do the next step: >> >> >> bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored-phrase-model.perl >> -scripts-root-dir bin/moses-scripts/scripts-YYYYMMDD-HHMM -root-dir >> working-dir -corpus working-dir/corpus/europarl.lowercased -f fr -e en >> -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm >> 0:5:working-dir/lm/europarl.lm:0 >> >> I assume that like ngram-count, I can just type in >> train-factored-phrase-model.perl without the full path...Do I need to >> set the -scripts-root-dir paramater? Are all the scripts in the same >> place? >> >> Thank you, >> >> Llio >> >> >> >> >> On 8/14/08, Murat ALPEREN <[EMAIL PROTECTED]> wrote: >> > Dear Llio, >> > >> > You should be okay with installing moses finally if you have installed all >> > tha dependant packages before. I am not aware of the 'whereis' command, >> but >> > once you train your model, your moses.ini file which is created by >> training >> > script will take care of the paths. However, you should carefully supply >> > paths while training your model. Before training your model, you should >> have >> > two seperate corpus files which are lowercased, sentence aligned and >> > accordingly tokenized (there are supplementary tools for this). Once you >> > have your corpus in two seperate files such as corpus.en, and corpus.fr >> you >> > will run a training perl script: train-factored-phrase-model.pl with >> various >> > parameters. If you need further help with this command after installing >> > moses and all training scripts, send me a reply including your exact path >> > for your corpus files and I will try to figure out the training command >> for >> > your paths. >> > >> > Cheers >> > >> > >> > On 8/13/08, Llio Humphreys <[EMAIL PROTECTED]> wrote: >> > > Hi Murat, >> > > thanks for this. I've got Ubuntu 8.04 so the Hardy Heron packages are >> > > what I need also >> > > (http://cl.naist.jp/~eric-n/ubuntu-nlp/dists/hardy/all/). >> > > >> > > I think I already got the order wrong...(sign of panic maybe?) >> > > I clicked on mckls deb and the package installer said it was already >> > installed. >> > > I clicked on srilm deb and the package installer said it was already >> > > installed, so I clicked Reinstall package. >> > > >> > > I can't find anything that says the order of installation, but note >> > > that the workshop baseline model requires installing giza before mckls >> > > Do I need to uninstall mkcls (if so how? is it just a matter of >> > > deleting the .exc file?) or is it enough to click on Reinstall >> > > package? >> > > >> > > When all this is done, how do I use Moses? Many of the commands in >> > > the baseline model >> > (http://www.statmt.org/wmt08/baseline.html) require >> > > pathnames to the various scripts and data: is it necessary to amend >> > > these commands or can I just type 'whereis' command to find what I >> > > need? >> > > >> > > Thanks, >> > > Llio >> > > >> > > >> > > On Wed, Aug 13, 2008 at 1:48 PM, Murat ALPEREN <[EMAIL PROTECTED]> >> > wrote: >> > > > Dear Llio, >> > > > >> > > > Eric's page will probably help you, I have installed pre-compiled >> debian >> > > > based Ubuntu - Hardy Heron packages. All the necessary binaries are >> > included >> > > > in Eric's repository which will guide you for the dependancies, that >> > means >> > > > there's an order of installation which you should follow. As far as I >> > > > remember you should first install srilm, then mkcls, giza and finally >> > moses. >> > > > Then you will be able to train your models or run any model on your >> > machine >> > > > >> > > > Regards >> > > > >> > > > >> > > > On 8/13/08, Anung Ariwibowo <[EMAIL PROTECTED]> wrote: >> > > >> >> > > >> Hi Llio, >> > > >> >> > > >> I can compile SRILM in Linux Ubuntu without problem. Can you post the >> > > >> error message here, maybe we can help. >> > > >> >> > > >> Cheers, >> > > >> Anung >> > > >> >> > > >> On Wed, Aug 13, 2008 at 8:29 PM, Llio Humphreys <[EMAIL PROTECTED]> >> > > >> wrote: >> > > >>> >> > > >>> Dear Josh/Hieu, >> > > >>> many thanks for your replies. The default shell is bash, and >> updating >> > > >>> the .profile file worked - thanks for that tip. I look forward to >> > > >>> hearing more from you about the ./model/extract.0-0.o.part* problem. >> > > >>> My apologies for my ignorance of Unix matters: I'd like to think of >> > > >>> myself as a newbie rather than one who is averse to learning about >> > > >>> these things, and the further information you have provided has been >> > > >>> useful and interesting. Hieu mentioned that Anung Ariwibowo got >> Moses >> > > >>> to work when he transferred to a Linux machine. A colleague has >> > > >>> kindly let me borrow a Linux/Ubuntu machine, but I have already run >> > > >>> into problems compiling SRILM! So, I'll see if Eric Nichols's >> > > >>> packages will take care of that: >> > > >>> >> > http://cl.naist.jp/~eric-n/ubuntu-nlp/dists/feisty/nlp/ >> > > >>> Best regards, >> > > >>> Llio >> > > >>> >> > > >>> >> > > >>> >> > > >>> On 8/13/08, Josh Schroeder <[EMAIL PROTECTED]> wrote: >> > > >>> > Hi Llio, >> > > >>> > >> > > >>> > >> > > >>> > > you may have already received my email on the following problem >> > when >> > > >>> > > building the language model: >> > > >>> > > >> > > >>> > > Executing: cat ./model/extract.0-0.o.part* > >> ./model/extract.0-0.o >> > > >>> > > cat: ./model/extract.0-0.o.part*: No such file or directory >> > > >>> > > Exit code: 1 >> > > >>> > > >> > > >>> > >> > > >>> > That's building the phrase table, not the language model. It >> seems >> > > >>> > like >> > > >>> > several people on the list are having problems with this step, so >> > I'm >> > > >>> > going >> > > >>> > to take a look at the training process and post something to the >> > list >> > > >>> > in the >> > > >>> > next day or two. >> > > >>> > >> > > >>> > >> > > >>> > > >> > > >>> > > 1. You mention that Moses does not use environment variables. >> > > >>> > > However, in order to get SRILM to work, I found it necessary to >> > > >>> > > create >> > > >>> > > environment variables and pass these on to SRILM's make: >> > > >>> > > >> > > >>> > > make SRILM=$PWD MACHINE_TYPE=macosx >> > > >>> > > >> > > >>> > >> > > >>> > >> > >> PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk >> > > >>> > > >> > MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man >> > > >>> > LC_NUMERIC=C >> > > >>> > > >> > > >>> > > In addition, I was also required to type in the following >> command >> > for >> > > >>> > > moses-scripts: >> > > >>> > > >> > > >>> > > export >> > > >>> > >> > > >>> > >> > >> SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses-scripts/scripts-20080811-1801 >> > > >>> > > >> > > >>> > > >> > > >>> > >> > > >>> > Sorry, I should have been more clear. Moses itself, the decoder >> > that >> > > >>> > loads >> > > >>> > a trained phrase table and language model and translates text, is >> a >> > > >>> > self-contained command-line program that doesn't require >> environment >> > > >>> > variables. >> > > >>> > >> > > >>> > Your first example is compiling SRILM. This is not part of the >> > Moses >> > > >>> > toolkit: it's a toolkit of its own for language modeling and a ton >> > of >> > > >>> > other >> > > >>> > stuff. We use it as one of two possible integrated language models >> > (the >> > > >>> > other is IRSTLM) with Moses. >> > > >>> > >> > > >>> > Your second example is part of the training regime. Yes, there is >> > some >> > > >>> > use >> > > >>> > of the SCRIPTS_ROOTDIR in the >> > > >>> > train-factored-phrase-model.perl, but for most >> > training >> > > >>> > support scripts that come with moses there is a flag that lets you >> > > >>> > specify >> > > >>> > SCRIPTS_ROOTDIR at the command line instead of storing it as an >> > > >>> > environment >> > > >>> > variable. In train-factored-phrase-model it's "-scripts-root-dir", >> > > >>> > which I >> > > >>> > think you've actually used in one of your other emails. >> > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > > If I open a new terminal and echo these variables, most of them >> > are >> > > >>> > > blank, and PATH just gives the default bin paths. >> > > >>> > > >> > > >>> > > So, how do I make them permanent? I assume that if I want to >> use >> > > >>> > > Moses again, it needs to have access to these variables? How >> can >> > I >> > > >>> > > ensure that I can close the terminal, go home, open a new >> terminal >> > > >>> > > the >> > > >>> > > next day and get Moses working again? A colleague suggested I >> > update >> > > >>> > > the .bashrc file to update each new terminal session with these >> > > >>> > > environment variables. However, my Mac system does not appear to >> > have >> > > >>> > > a .bashrc system as a default, and when I created one in my home >> > > >>> > > directory and opened a new terminal, it did not access the >> .bashrc >> > > >>> > > file. >> > > >>> > > >> > > >>> > >> > > >>> > Here's some info on environment variables on the Mac, found with >> a >> > > >>> > quick >> > > >>> > Google search: >> > > >>> >> > > http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html >> > > >>> > >> > > >>> > I tried it with .profile, that worked fine. Are you sure you're >> set >> > to >> > > >>> > use >> > > >>> > the bash shell? Try ' echo $SHELL ' in Terminal. >> > > >>> > >> > > >>> > >> > > >>> > > 2. You say that you ran the decoder on your laptop just fine, >> but >> > had >> > > >>> > > to change a few scripts for training. I have very basic >> knowledge >> > of >> > > >>> > > Unix systems and installing open-source software: would it be >> > > >>> > > possible >> > > >>> > > for you to detail the changes you did to the scripts to get it >> to >> > run >> > > >>> > > on a Mac? Although I need this information urgently, it may >> also >> > be >> > > >>> > > useful for other students who are installing Moses on a Mac and >> > who >> > > >>> > > may also have basic knowledge of Unix installation procedures. >> > > >>> > > >> > > >>> > >> > > >>> > I'll look into this. Mac isn't really the platform of choice for >> > > >>> > training a >> > > >>> > Moses model and I do most of my work on linux. If I recall >> > correctly, >> > > >>> > an >> > > >>> > Intel-based Mac should be easier to get working than a PowerPC >> one. >> > The >> > > >>> > *decoder* does work on my Intel-based laptop, but I haven't run a >> > full >> > > >>> > training setup locally in some time -- most of the time we're >> > working >> > > >>> > with >> > > >>> > so much data that I use a cluster of linux machines instead of my >> > Mac. >> > > >>> > >> > > >>> > As a word of caution: Moses isn't an out-of-the box translation >> > > >>> > solution >> > > >>> > for end users. It's research software undergoing active >> development, >> > so >> > > >>> > almost every user -- on any platform -- will need to muck around >> in >> > > >>> > the >> > > >>> > scripts at some point, or face a compile error or runtime crash. >> The >> > > >>> > ability >> > > >>> > to deal with unix/linux command line tools, and debug code and >> > scripts >> > > >>> > when >> > > >>> > necessary, is really important. That being said, I'll see what I >> can >> > do >> > > >>> > about highlighting where the scripts might have problems on the >> Mac. >> > > >>> > >> > > >>> > >> > > >>> > > 3. My final question: which is embarrasingly basic...can I use >> the >> > > >>> > > one >> > > >>> > > installation of Moses for different corpora, or do I need to do >> a >> > > >>> > > separate installation for each one? Can I have separate >> > > >>> > > installations >> > > >>> > > of SRILM, Giza and mckls, or should they all reference the same >> > > >>> > > libraries? >> > > >>> > > >> > > >>> > >> > > >>> > All you need to do to have moses use different corpora is point >> it >> > to >> > > >>> > a >> > > >>> > different moses.ini file. Assuming you have compiled moses with >> > support >> > > >>> > for >> > > >>> > the language model specified in the file (IRSTLM or SRILM), it >> will >> > > >>> > translate. You should only need one copy of giza, mkcls, >> irst/srilm, >> > > >>> > and >> > > >>> > moses. The code stays the same, it's the data model that's >> > different. >> > > >>> > >> > > >>> > -Josh >> > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > -- >> > > >>> > The University of Edinburgh is a charitable body, registered in >> > > >>> > Scotland, with registration number SC005336. >> > > >>> > >> > > >>> > >> > > >>> _______________________________________________ >> > > >>> Moses-support mailing list >> > > >>> [email protected] >> > > >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > >>> >> > > >> >> > > >> >> > > >> -- >> > > >> barliant at {gmail.com, yahoo.com} >> > > >> Starting July 2008, barliant at cbn.net.id is no longer active >> > > >> Visit my Blog at barliant dot blogspot dot com >> > > >> >> > > >> _______________________________________________ >> > > >> Moses-support mailing list >> > > >> [email protected] >> > > >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > >> >> > > > >> > > > >> > > >> > >> > >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
