Dear Murat, Anung, Hieu, Josh, Eric, Miles, Sara, Amittai, thank you all for your help. It is very, very much appreciated. I decided to try Eric's packages, and it looks like the installation worked. I typed some of the commands in the Baseline instructions without arguments, and the program either output to the screen that I missed some arguments or gave a description of the program. Thank you Eric!!!
Following the Baseline instructions (http://www.statmt.org/wmt08/baseline.html) I have now got to the following step: Use SRILM to build language model: /path-to-srilm/bin/i686/ngram-count -order 5 -interpolate -kndiscount -text working-dir/lm/europarl.lowercased -lm working-dir/lm/europarl.lm In my case, I was in folder home/llio/MOSESMTDATA. I didn't know the path to ngram-count, but it was possible to invoke it without the path: ngram-count -order 5 -interpolate -kndiscount -text europarl/lm/europarl.lowercased -lm europarl/lm/europarl.lm I'm concerned about two things: 1) this ngram-count step is taking a very long time. I think I started it off around 6pm yesterday, but it's still going. It's very resource-intensive, and it's difficult to get to other windows open. I went to check up on it around 9pm, and couldn't find that particular terminal. I thought I had closed that terminal by mistake, so I stupidly opened another one, and entered the same command. I subsequently found that the original terminal was still open, so I closed the second one. I'm not sure if issuing this command a second time on the same program and files on a different terminal would corrupt the original ngramcount step, and whether I should start it off again, or whether starting it off again would make things worse? I looked up ngram-count (http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html) and I don't think it outputs to any file, so I guess you have to be in the same terminal to do the next step? I opened another terminal and typed 'top' to see what processes are running, and I know that ngram-count is doing something, but whether it's doing well or stuck in a loop, I can't say. What I do find strange is that the time for ngram-count is said to be 00:58:20, and it's been going for hours.. I searched this problem in previous Moses Group emails and I understand that if I run this with order 4 instead of 5 it will run quicker with very similar results? So, can I just stop what it's doing, and run this command in the same terminal with order 4? Are there any files I need to 'touch' to ensure that it doesn't leave any stone unturned? 2) how to do the next step: bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/train-factored-phrase-model.perl -scripts-root-dir bin/moses-scripts/scripts-YYYYMMDD-HHMM -root-dir working-dir -corpus working-dir/corpus/europarl.lowercased -f fr -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:5:working-dir/lm/europarl.lm:0 I assume that like ngram-count, I can just type in train-factored-phrase-model.perl without the full path...Do I need to set the -scripts-root-dir paramater? Are all the scripts in the same place? Thank you, Llio On 8/14/08, Murat ALPEREN <[EMAIL PROTECTED]> wrote: > Dear Llio, > > You should be okay with installing moses finally if you have installed all > tha dependant packages before. I am not aware of the 'whereis' command, but > once you train your model, your moses.ini file which is created by training > script will take care of the paths. However, you should carefully supply > paths while training your model. Before training your model, you should have > two seperate corpus files which are lowercased, sentence aligned and > accordingly tokenized (there are supplementary tools for this). Once you > have your corpus in two seperate files such as corpus.en, and corpus.fr you > will run a training perl script: train-factored-phrase-model.pl with various > parameters. If you need further help with this command after installing > moses and all training scripts, send me a reply including your exact path > for your corpus files and I will try to figure out the training command for > your paths. > > Cheers > > > On 8/13/08, Llio Humphreys <[EMAIL PROTECTED]> wrote: > > Hi Murat, > > thanks for this. I've got Ubuntu 8.04 so the Hardy Heron packages are > > what I need also > > (http://cl.naist.jp/~eric-n/ubuntu-nlp/dists/hardy/all/). > > > > I think I already got the order wrong...(sign of panic maybe?) > > I clicked on mckls deb and the package installer said it was already > installed. > > I clicked on srilm deb and the package installer said it was already > > installed, so I clicked Reinstall package. > > > > I can't find anything that says the order of installation, but note > > that the workshop baseline model requires installing giza before mckls > > Do I need to uninstall mkcls (if so how? is it just a matter of > > deleting the .exc file?) or is it enough to click on Reinstall > > package? > > > > When all this is done, how do I use Moses? Many of the commands in > > the baseline model > (http://www.statmt.org/wmt08/baseline.html) require > > pathnames to the various scripts and data: is it necessary to amend > > these commands or can I just type 'whereis' command to find what I > > need? > > > > Thanks, > > Llio > > > > > > On Wed, Aug 13, 2008 at 1:48 PM, Murat ALPEREN <[EMAIL PROTECTED]> > wrote: > > > Dear Llio, > > > > > > Eric's page will probably help you, I have installed pre-compiled debian > > > based Ubuntu - Hardy Heron packages. All the necessary binaries are > included > > > in Eric's repository which will guide you for the dependancies, that > means > > > there's an order of installation which you should follow. As far as I > > > remember you should first install srilm, then mkcls, giza and finally > moses. > > > Then you will be able to train your models or run any model on your > machine > > > > > > Regards > > > > > > > > > On 8/13/08, Anung Ariwibowo <[EMAIL PROTECTED]> wrote: > > >> > > >> Hi Llio, > > >> > > >> I can compile SRILM in Linux Ubuntu without problem. Can you post the > > >> error message here, maybe we can help. > > >> > > >> Cheers, > > >> Anung > > >> > > >> On Wed, Aug 13, 2008 at 8:29 PM, Llio Humphreys <[EMAIL PROTECTED]> > > >> wrote: > > >>> > > >>> Dear Josh/Hieu, > > >>> many thanks for your replies. The default shell is bash, and updating > > >>> the .profile file worked - thanks for that tip. I look forward to > > >>> hearing more from you about the ./model/extract.0-0.o.part* problem. > > >>> My apologies for my ignorance of Unix matters: I'd like to think of > > >>> myself as a newbie rather than one who is averse to learning about > > >>> these things, and the further information you have provided has been > > >>> useful and interesting. Hieu mentioned that Anung Ariwibowo got Moses > > >>> to work when he transferred to a Linux machine. A colleague has > > >>> kindly let me borrow a Linux/Ubuntu machine, but I have already run > > >>> into problems compiling SRILM! So, I'll see if Eric Nichols's > > >>> packages will take care of that: > > >>> > http://cl.naist.jp/~eric-n/ubuntu-nlp/dists/feisty/nlp/ > > >>> Best regards, > > >>> Llio > > >>> > > >>> > > >>> > > >>> On 8/13/08, Josh Schroeder <[EMAIL PROTECTED]> wrote: > > >>> > Hi Llio, > > >>> > > > >>> > > > >>> > > you may have already received my email on the following problem > when > > >>> > > building the language model: > > >>> > > > > >>> > > Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o > > >>> > > cat: ./model/extract.0-0.o.part*: No such file or directory > > >>> > > Exit code: 1 > > >>> > > > > >>> > > > >>> > That's building the phrase table, not the language model. It seems > > >>> > like > > >>> > several people on the list are having problems with this step, so > I'm > > >>> > going > > >>> > to take a look at the training process and post something to the > list > > >>> > in the > > >>> > next day or two. > > >>> > > > >>> > > > >>> > > > > >>> > > 1. You mention that Moses does not use environment variables. > > >>> > > However, in order to get SRILM to work, I found it necessary to > > >>> > > create > > >>> > > environment variables and pass these on to SRILM's make: > > >>> > > > > >>> > > make SRILM=$PWD MACHINE_TYPE=macosx > > >>> > > > > >>> > > > >>> > > PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk > > >>> > > > MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man > > >>> > LC_NUMERIC=C > > >>> > > > > >>> > > In addition, I was also required to type in the following command > for > > >>> > > moses-scripts: > > >>> > > > > >>> > > export > > >>> > > > >>> > > SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses-scripts/scripts-20080811-1801 > > >>> > > > > >>> > > > > >>> > > > >>> > Sorry, I should have been more clear. Moses itself, the decoder > that > > >>> > loads > > >>> > a trained phrase table and language model and translates text, is a > > >>> > self-contained command-line program that doesn't require environment > > >>> > variables. > > >>> > > > >>> > Your first example is compiling SRILM. This is not part of the > Moses > > >>> > toolkit: it's a toolkit of its own for language modeling and a ton > of > > >>> > other > > >>> > stuff. We use it as one of two possible integrated language models > (the > > >>> > other is IRSTLM) with Moses. > > >>> > > > >>> > Your second example is part of the training regime. Yes, there is > some > > >>> > use > > >>> > of the SCRIPTS_ROOTDIR in the > > >>> > train-factored-phrase-model.perl, but for most > training > > >>> > support scripts that come with moses there is a flag that lets you > > >>> > specify > > >>> > SCRIPTS_ROOTDIR at the command line instead of storing it as an > > >>> > environment > > >>> > variable. In train-factored-phrase-model it's "-scripts-root-dir", > > >>> > which I > > >>> > think you've actually used in one of your other emails. > > >>> > > > >>> > > > >>> > > > >>> > > If I open a new terminal and echo these variables, most of them > are > > >>> > > blank, and PATH just gives the default bin paths. > > >>> > > > > >>> > > So, how do I make them permanent? I assume that if I want to use > > >>> > > Moses again, it needs to have access to these variables? How can > I > > >>> > > ensure that I can close the terminal, go home, open a new terminal > > >>> > > the > > >>> > > next day and get Moses working again? A colleague suggested I > update > > >>> > > the .bashrc file to update each new terminal session with these > > >>> > > environment variables. However, my Mac system does not appear to > have > > >>> > > a .bashrc system as a default, and when I created one in my home > > >>> > > directory and opened a new terminal, it did not access the .bashrc > > >>> > > file. > > >>> > > > > >>> > > > >>> > Here's some info on environment variables on the Mac, found with a > > >>> > quick > > >>> > Google search: > > >>> > > http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html > > >>> > > > >>> > I tried it with .profile, that worked fine. Are you sure you're set > to > > >>> > use > > >>> > the bash shell? Try ' echo $SHELL ' in Terminal. > > >>> > > > >>> > > > >>> > > 2. You say that you ran the decoder on your laptop just fine, but > had > > >>> > > to change a few scripts for training. I have very basic knowledge > of > > >>> > > Unix systems and installing open-source software: would it be > > >>> > > possible > > >>> > > for you to detail the changes you did to the scripts to get it to > run > > >>> > > on a Mac? Although I need this information urgently, it may also > be > > >>> > > useful for other students who are installing Moses on a Mac and > who > > >>> > > may also have basic knowledge of Unix installation procedures. > > >>> > > > > >>> > > > >>> > I'll look into this. Mac isn't really the platform of choice for > > >>> > training a > > >>> > Moses model and I do most of my work on linux. If I recall > correctly, > > >>> > an > > >>> > Intel-based Mac should be easier to get working than a PowerPC one. > The > > >>> > *decoder* does work on my Intel-based laptop, but I haven't run a > full > > >>> > training setup locally in some time -- most of the time we're > working > > >>> > with > > >>> > so much data that I use a cluster of linux machines instead of my > Mac. > > >>> > > > >>> > As a word of caution: Moses isn't an out-of-the box translation > > >>> > solution > > >>> > for end users. It's research software undergoing active development, > so > > >>> > almost every user -- on any platform -- will need to muck around in > > >>> > the > > >>> > scripts at some point, or face a compile error or runtime crash. The > > >>> > ability > > >>> > to deal with unix/linux command line tools, and debug code and > scripts > > >>> > when > > >>> > necessary, is really important. That being said, I'll see what I can > do > > >>> > about highlighting where the scripts might have problems on the Mac. > > >>> > > > >>> > > > >>> > > 3. My final question: which is embarrasingly basic...can I use the > > >>> > > one > > >>> > > installation of Moses for different corpora, or do I need to do a > > >>> > > separate installation for each one? Can I have separate > > >>> > > installations > > >>> > > of SRILM, Giza and mckls, or should they all reference the same > > >>> > > libraries? > > >>> > > > > >>> > > > >>> > All you need to do to have moses use different corpora is point it > to > > >>> > a > > >>> > different moses.ini file. Assuming you have compiled moses with > support > > >>> > for > > >>> > the language model specified in the file (IRSTLM or SRILM), it will > > >>> > translate. You should only need one copy of giza, mkcls, irst/srilm, > > >>> > and > > >>> > moses. The code stays the same, it's the data model that's > different. > > >>> > > > >>> > -Josh > > >>> > > > >>> > > > >>> > > > >>> > -- > > >>> > The University of Edinburgh is a charitable body, registered in > > >>> > Scotland, with registration number SC005336. > > >>> > > > >>> > > > >>> _______________________________________________ > > >>> Moses-support mailing list > > >>> [email protected] > > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > > >>> > > >> > > >> > > >> -- > > >> barliant at {gmail.com, yahoo.com} > > >> Starting July 2008, barliant at cbn.net.id is no longer active > > >> Visit my Blog at barliant dot blogspot dot com > > >> > > >> _______________________________________________ > > >> Moses-support mailing list > > >> [email protected] > > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > >> > > > > > > > > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
