Re: [Moses-support] Moses: Prepare Data, Build Language Model and Train Model

Josh Schroeder Wed, 13 Aug 2008 03:54:25 -0700

Hi Llio,

> you may have already received my email on the following problem when
> building the language model:
>
> Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
> cat: ./model/extract.0-0.o.part*: No such file or directory
> Exit code: 1


That's building the phrase table, not the language model. It seems  
like several people on the list are having problems with this step, so  
I'm going to take a look at the training process and post something to  
the list in the next day or two.

>
> 1. You mention that Moses does not use environment variables.
> However, in order to get SRILM to work, I found it necessary to create
> environment variables and pass these on to SRILM's make:
>
> make SRILM=$PWD MACHINE_TYPE=macosx
> PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/ 
> MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/ 
> lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk
> MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man LC_NUMERIC=C
>
> In addition, I was also required to type in the following command for
> moses-scripts:
>
> export SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses- 
> scripts/scripts-20080811-1801
>

Sorry, I should have been more clear. Moses itself, the decoder that  
loads a trained phrase table and language model and translates text,  
is a self-contained command-line program that doesn't require  
environment variables.

Your first example is compiling SRILM. This is not part of the Moses  
toolkit: it's a toolkit of its own for language modeling and a ton of  
other stuff. We use it as one of two possible integrated language  
models (the other is IRSTLM) with Moses.

Your second example is part of the training regime. Yes, there is some  
use of the SCRIPTS_ROOTDIR in the train-factored-phrase-model.perl,  
but for most training support scripts that come with moses there is a  
flag that lets you specify SCRIPTS_ROOTDIR at the command line instead  
of storing it as an environment variable. In train-factored-phrase- 
model it's "-scripts-root-dir", which I think you've actually used in  
one of your other emails.


> If I open a new terminal and echo these variables, most of them are
> blank, and PATH just gives the default bin paths.
>
> So, how do I make them permanent?  I assume that if I want to use
> Moses again, it needs to have access to these variables?  How can I
> ensure that I can close the terminal, go home, open a new terminal the
> next day and get Moses working again?  A colleague suggested I update
> the .bashrc file to update each new terminal session with these
> environment variables. However, my Mac system does not appear to have
> a .bashrc system as a default, and when I created one in my home
> directory and opened a new terminal, it did not access the .bashrc
> file.

Here's some info on environment variables on the Mac, found with a  
quick Google search:
http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html

I tried it with .profile, that worked fine. Are you sure you're set to  
use the bash shell? Try ' echo $SHELL ' in Terminal.

> 2. You say that you ran the decoder on your laptop just fine, but had
> to change a few scripts for training.  I have very basic knowledge of
> Unix systems and installing open-source software: would it be possible
> for you to detail the changes you did to the scripts to get it to run
> on a Mac?  Although I need this information urgently, it may also be
> useful for other students who are installing Moses on a Mac and who
> may also have basic knowledge of Unix installation procedures.

I'll look into this. Mac isn't really the platform of choice for  
training a Moses model and I do most of my work on linux. If I recall  
correctly, an Intel-based Mac should be easier to get working than a  
PowerPC one. The *decoder* does work on my Intel-based laptop, but I  
haven't run a full training setup locally in some time -- most of the  
time we're working with so much data that I use a cluster of linux  
machines instead of my Mac.

As a word of caution: Moses isn't an out-of-the box translation  
solution for end users. It's research software undergoing active  
development, so almost every user -- on any platform --  will need to  
muck around in the scripts at some point, or face a compile error or  
runtime crash. The ability to deal with unix/linux command line tools,  
and debug code and scripts when necessary, is really important. That  
being said, I'll see what I can do about highlighting where the  
scripts might have problems on the Mac.

> 3. My final question: which is embarrasingly basic...can I use the one
> installation of Moses for different corpora, or do I need to do a
> separate installation for each one?  Can I have separate installations
> of SRILM, Giza and mckls, or should they all reference the same
> libraries?

All you need to do to have moses use different corpora is point it to  
a different moses.ini file. Assuming you have compiled moses with  
support for the language model specified in the file (IRSTLM or  
SRILM), it will translate. You should only need one copy of giza,  
mkcls, irst/srilm, and moses. The code stays the same, it's the data  
model that's different.

-Josh


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses: Prepare Data, Build Language Model and Train Model

Reply via email to