Dear Josh,
you may have already received my email on the following problem when
building the language model:

Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
cat: ./model/extract.0-0.o.part*: No such file or directory
Exit code: 1

So, as not to conflate issues, I will ask other questions in this
separate email, and I address them primarily to you because they may
be Mac-related, and you have sucessfully installed Moses on a Mac.
Wee attached my history file of commands entered - it will be clear
that I tried to install this in two separate folders, and the second
installation worked up to a point.

1. You mention that Moses does not use environment variables.
However, in order to get SRILM to work, I found it necessary to create
environment variables and pass these on to SRILM's make:

make SRILM=$PWD MACHINE_TYPE=macosx
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk
MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man LC_NUMERIC=C

In addition, I was also required to type in the following command for
moses-scripts:

export 
SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses-scripts/scripts-20080811-1801

If I open a new terminal and echo these variables, most of them are
blank, and PATH just gives the default bin paths.

So, how do I make them permanent?  I assume that if I want to use
Moses again, it needs to have access to these variables?  How can I
ensure that I can close the terminal, go home, open a new terminal the
next day and get Moses working again?  A colleague suggested I update
the .bashrc file to update each new terminal session with these
environment variables. However, my Mac system does not appear to have
a .bashrc system as a default, and when I created one in my home
directory and opened a new terminal, it did not access the .bashrc
file.

2. You say that you ran the decoder on your laptop just fine, but had
to change a few scripts for training.  I have very basic knowledge of
Unix systems and installing open-source software: would it be possible
for you to detail the changes you did to the scripts to get it to run
on a Mac?  Although I need this information urgently, it may also be
useful for other students who are installing Moses on a Mac and who
may also have basic knowledge of Unix installation procedures.

3. My final question: which is embarrasingly basic...can I use the one
installation of Moses for different corpora, or do I need to do a
separate installation for each one?  Can I have separate installations
of SRILM, Giza and mckls, or should they all reference the same
libraries?

Thank you for your help and patience,
Kind regards,
Llio Humphreys



On 7/25/08, Josh Schroeder <[EMAIL PROTECTED]> wrote:
> Hi Llio,
>
>  You've got a lot of questions spread around in this message. I'll try to
> get to most of them.
>
>
> >
> > >
> > > Dear Moses Group,
> > >
> > > I am having difficulties running the Moses software (not the recently
> > > released version), following the guidelines at
> > > http://www.statmt.org/wmt07/baseline.html and I attach
> a record of the
> > > final part of the terminal session for your information.
> > >
> > > I started with parallel input files, with each line containing one
> > > sentence, both already tokenised, tab delimited, and in ASCII (is
> > > UTF-8 better?)
> > >
> >
>
>  Moses itself is encoding-agnostic - use whatever encoding you want. Some of
> the support scripts on statmt.org (tokenizer.perl, for example) are geared
> to work better with UTF-8.  I find UTF-8 a lot easier to use -- especially
> when you start dealing with multiple language pairs with different native
> encodings.
>
>
> >
> > > I followed the instructions under the Prepare Data heading.  I briefly
> > > inspected the .tok output files, and preferred the original tokenised
> > > version e.g. reference numbers with / were not split up.  So, I
> > > renamed the original input files as .tok files, filtered out long
> > > sentences and lowercased the training data.
> > >
> >
>
>  I think you're saying you didn't like the behavior of our sample tokenizer
> with regards to some feature in the training data. If your original files
> are already tokenized in some way, you can just use that data instead of
> re-applying tokenization. Some form of tokenization is definitely important
> though: you don't want "no," "no!" "no." and "no?" to all be treated as
> distinct words instead of multiple instances of the word "no".
>
>
> >
> > > I then proceeded to the Language Model. The instructions seemed pretty
> > > much the same as for the Prepare Data section, so I moved the
> > > lowercased files from the corpus directory to the lm directory. Is
> > > this the right thing to do?
> > >
> >
>
>  This is an *acceptable* thing to do, but maybe not the best choice. More
> data for language models is always better. When we make the Europarl data
> parallel for a given language pair, we drop mis-matched sentences,
> paragraphs, even whole documents that don't have a version in both
> languages. In the Prepare Data section, as you mentioned, we filter out long
> sentences. All of that dropped data on the target side can be useful to the
> language model. That's why a non-paired monolingual .en file is used in the
> example, and is only tokenized and lowercased, not filtered for long
> sentences.
>
>
> >
> > > I then trained the model and the system crashed with the following
> message:-
> > >
> > > Executing:
> bin/moses-scripts/scripts-20080125-1939/training/phrase-extract/extract
> > > ./model/aligned.0.en ./model/aligned.0.cy
> > > ./model/aligned.grow-diag-final-and ./model/extract.0-0
> 7 orientation
> > > PhraseExtract v1.3.0, written by Philipp Koehn
> > > phrase extraction from an aligned parallel corpus
> > > (also extracting orientation)
> > > Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
> > > cat: ./model/extract.0-0.o.part*: No such file or directory
> > > Exit code: 1
> > > Died at
> bin/moses-scripts/scripts-20080125-1939/training/train-factored-phrase-model.perl
> > > line 899.
> > >
> > > So, my question is: am I giving Moses the wrong data to work with?
> > >
> >
>
>  I think it's more likely that some file is misplaced (you say you 'moved'
> the lowercased files to the lm directory - did you copy them or delete
> them?) or that some part of the
> train-factored-phrase-model.perl process isn't running
> correctly. The full stdout/stderr of the perl script should help you debug
> what is getting done and what is failing. The "Executing:" calls are just
> copies of what is sent to the command line, so you can always try copy and
> pasting that and running it yourself outside of the perl script to debug
> what's going wrong. You've got the perl script, too, so poke around inside
> it and figure out what it's doing. That's the beauty of open-source. :)
>
>
> >
> > > In order to find out, I downloaded europarl from
> > > http://www.statmt.org/europarl/.  It contained version
> 2 rather than
> > > version 3 but I thought nevertheless that I might try using it.  I ran
> > > sentence-align-corpus.perl:
> > >
> >
>
>  The downloads from that page contain version 3, not v2. What made you think
> it was version 2? Maybe we missed a readme somewhere, but the data is v3 for
> sure.
>
>
> >
> > > ./sentence-align-corpus.perl en de
> > >
> > > , but it exited with the following message:
> > >
> > > Died at ./sentence-align-corpus.perl line 16.
> > >
> > > sentence-align-corpus.perl line 16 says:
> > > die unless -e "$dir/$l1";
> > >
> >
>
>  Yeah, there was a bug in sentence-align-corpus. Line 9 should read
>
>  my $dir = "txt";
>
>  It was looking in the wrong directory. You can either fix your version or
> re-download the tools.tgz file from the Europarl page.
>
>
> >
> > > Should I continue with europarl 2 or is it possible to download
> > > europarl 3 from somewhere?
> > >
> >
>
>  See above. v3 is what is available. v2 is available in an archive page at
> <http://www.statmt.org/europarl/archives.html>
>
>
> >
> > > Alternatively would it be possible for you to explain the difference
> > > in purpose and format between
> wmt07/training/europarl-v3.fr-en.fr and
> > > wmt07/training/europarl-v3.en?
> > >
> >
>
>  You can get the files that tutorial is talking about from
> <http://www.statmt.org/wmt07/shared-task.html#download> and
> look through them yourself. The europarl-v3.fr-en.* files come in a pair.
> There should be europarl-v3.fr-en.en and europarl-v3.fr-en.fr.  All 3 files
> have one sentence per line, europarl-v3.fr-en.en and europarl-v3.fr-en.fr
> have an identical number of lines, and europarl-v3.en has a superset of the
> europarl-v3.fr-en.en data. Expanding on what I said about LM data above,
> more data can go into the non-paired file because we don't have to match
> documents across two languages. We need paired data for word alignments, but
> any monolingual target data is useful for language modeling.
>
>
> >
> > > Just to clarify: am I correct in
> > > saying that the Prepare Data section is about training the translation
> > > model i.e. word and phrase alignments, and Language model section is
> > > about creating a language model for the language we're translating to?
> > >
> >
>
>  Correct.
>
>
> >
> > > Does the Prepare Data section start with two plain text parallel
> > > corpora with sentences on each line or  is something more elaborate
> > > than that?  Maybe the
> wmt07/training/europarl-v3.fr-en.fr is a plain
> > > text file with French sentence 1 followed by English sentence 1
> > > followed by French sentence 2 followed by English sentence 2 etc?  I
> > > could then adapt the Welsh-English corpus I'm using accordingly.
> > >
> >
>
>  These paired files should have exactly the same number of lines. Line 1 in
> .en and Line 1 in .fr should be the same sentence, one file in English and
> one in French. These are the results of running sentence-align-corpus,
> combining all the files for each language, and filtering out the lines with
> XML tags. If you want to play with prepared files and not "roll your own"
> from the Europarl data, check out the wmt07 and wmt08 websites for
> downloadable monolingual and parallel training data.
>
>
> >
> > > Otherwise, is there a problem with the software/implementation on a
> > > Mac system? Would you recommend that I try the recently released
> > > version of Moses?  Is there some way to install the new version of
> > > Moses without uninstalling the other one (I'm wondering about
> > > environment variables)
> > >
> >
>
>  I've run the decoder on my mac laptop just fine. You may have to change a
> few scripts for training - for example, I know the mac uses 'gzcat' instead
> of 'zcat'. Moses doesn't use environment variables. Compile it in a
> different directory and you've got a second copy!
>
>
>  Good luck!
>
>  Josh
>
>  --
>  The University of Edinburgh is a charitable body, registered in
>  Scotland, with registration number SC005336.
>
>
    1  GCC
    2  gcc --version
    3  wish
    4  gcc --version
    5  cd MTRESEARCH/MOSES08/srilm
    6  pwd
    7  gnumake World
    8  
PATH=$PATH:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin
    9  echo $PATH
   10  MANPATH=Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
   11  echo $MANPATH
   12  cd test
   13  gnumake all
   14  cd ../
   15  echo $SRILM
   16  echo $MACHINE_TYPE
   17  pwd
   18  echo $PWD
   19  make SRILM=$PWD MACHINE_TYPE=macosx
   20  cd test
   21  gnumake all
   22  gnumake all SRILM=$PWD MACHINE_TYPE=macosx
   23  cd ../
   24  make clean
   25  gnumake cleanest
   26  echo SRILM
   27  echo $SRILM
   28  SRILM=$PWD
   29  echo $SRILM
   30  MACHINE_TYPE=macosx
   31  echo $MACHINE_TYPE
   32  make SRILM=$PWD MACHINE_TYPE=macosx
   33  cd test
   34  gnumake all SRILM=$PWD MACHINE_TYPE=macosx
   35  ngram -version
   36  cd ../
   37  ngram -version
   38  echo PATH
   39  echo $PATH
   40  gawk --version
   41  awk --version
   42  awk -W version
   43  awk version
   44  awk 
   45  awk -v
   46  man awk
   47  PATH=$PATH:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
   48  echo $PATH
   49  echo $MANPATH
   50  make SRILM=$PWD 
PATH=bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
 MANPATH=Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
   51  make clean SRILM=$PWD 
PATH=bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
 MANPATH=Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man 
   52  make clean SRILM=$PWD 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
 MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man 
   53  make SRILM=$PWD 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
 MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man 
   54  cd test
   55  make SRILM=$PWD 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
 MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man 
   56  cd ../
   57  make clean
   58  make SRILM=$PWD 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
 MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man 
   59  make SRILM=$PWD MACHINE_TYPE=macosx 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx:/sw/bin/gawk
 MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man 
   60  echo $PATH
   61  echo $MANPATH
   62  echo $SRILM
   63* 
   64  cd test
   65  echo $SRILM
   66  echo $MANPATH
   67  echo $PATH
   68  make all
   69  cd ../../../../
   70  cd V3MTRESEARCH/MOSESSUITE/srilm
   71  make SRILM=$PWD MACHINE_TYPE=macosx 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx:/sw/bin/gawk
 MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSESSUITE/srilm/man 
   72  cd ../../../../
   73  ls
   74  cd Users/lliohumphreys/
   75  ls
   76  cd MT/MOSESSUITE/srilm
   77  ls
   78  make SRILM=$PWD MACHINE_TYPE=macosx 
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk
 MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man LC_NUMERIC=C 
   79  cd test
   80  gnumake all
   81  cd ../
   82  gnumake cleanest
   83  cd ../
   84  ls
   85  cd giza-pp
   86  make all
   87  echo $PATH
   88  PATH=$PATH:/opt:/sw
   89  echo $PATH
   90  make clean
   91  make all
   92  sudo apt-get install crt0.o
   93  make clean
   94  make all --static-lib
   95  ./configure help
   96  ./configure --help
   97  make --help
   98  man pushd
   99  ls
  100  cd ..
  101  ls
  102  cd ..
  103  ls
  104  cd ..
  105  ls
  106  cd Desktop/cs
  107  cd Desktop/
  108  ls
  109  cd Csu-45
  110  ls
  111  mkdir -p build/csu
  112  ls
  113  pushd build/csu/
  114  ls
  115  ls
  116  cd ../..
  117  ls
  118  find . -name configure
  119  cd ..
  120  ls
  121  ls
  122  tar -xvf Csu-45.tar ./test
  123  man tar
  124  cd Csu-45
  125  ls
  126  make 
  127  ls
  128  cd ../../../../../
  129  ls
  130  cd usr;
  131  ls
  132  man indr
  133  cd ..
  134  locate indr
  135  find /usr/ -name indr
  136  ld
  137  ls
  138  cd Users/lliohumphreys/MT
  139  cd MOSESSUITE/;ls
  140  cd giza-pp
  141  ls
  142  cat README 
  143  ls
  144  vim README 
  145  vim Makefile 
  146  cd GIZA++-v2/
  147  ls
  148  cat dependencies 
  149  ls
  150  mke
  151  make
  152  vim Parameter.
  153  vim Parameter.h
  154  cd ..
  155  vim Makefile 
  156  ls
  157  cd GIZA++-v2/
  158  ls
  159  ls optimized/
  160  cd ..
  161  cat Makefile 
  162  ls
  163  cd GIZA++-v2/
  164  ls
  165  g++  -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE 
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o 
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o 
optimized/getSentence.o optimized/TTables.o optimized/ATables.o 
optimized/AlignTables.o optimized/main.o optimized/NTables.o 
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o 
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o 
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o 
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o 
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o 
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o 
optimized/ForwardBackward.o -o GIZA++
  166  ls
  167  ls optimized/
  168  ls
  169  g++  -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE 
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o 
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o 
optimized/getSentence.o optimized/TTables.o optimized/ATables.o 
optimized/AlignTables.o optimized/main.o optimized/NTables.o 
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o 
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o 
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o 
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o 
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o 
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o 
optimized/ForwardBackward.o -o GIZA++
  170  ls GIZA++  -l
  171  ls -l GIZA++
  172  g++  -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE 
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o 
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o 
optimized/getSentence.o optimized/TTables.o optimized/ATables.o 
optimized/AlignTables.o optimized/main.o optimized/NTables.o 
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o 
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o 
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o 
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o 
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o 
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o 
optimized/ForwardBackward.o -static -o GIZA++
  173  g++  -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE 
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o 
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o 
optimized/getSentence.o optimized/TTables.o optimized/ATables.o 
optimized/AlignTables.o optimized/main.o optimized/NTables.o 
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o 
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o 
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o 
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o 
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o 
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o 
optimized/ForwardBackward.o -o GIZA++
  174  ./GIZA++ 
  175  cd ../
  176  make mkcls-v2
  177  cd GIZA++-v2/
  178  make snt2cooc.out
  179  cd ../
  180  cp GIZA++-v2/GIZA++ bin/
  181  mkdir -p bin
  182  cp GIZA++-v2/GIZA++ bin/
  183  cp GIZA++-v2/snt2cooc.out bin/
  184* cp giza-pp/mkcls-v2/mkcls bin/
  185  cd bin
  186  ls
  187  cd ../
  188  mkdir -p moses
  189  svn co 
https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk moses
  190  cd ../
  191  mkdir -p bin
  192  cp giza-pp/GIZA++-v2/GIZA++ bin/
  193  cp giza-pp/mkcls-v2/mkcls bin/
  194  cp giza-pp/GIZA++-v2/snt2cooc.out bin/
  195  cd moses
  196  mkdir -p moses
  197  cd moses
  198  ./regenerate-makefiles.sh
  199  touch *
  200  ./regenerate-makefiles.sh
  201  echo PATH
  202  echo $PATH
  203  ./configure --with-srilm=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm
  204  make
  205  mkdir -p bin/moses-scripts
  206  cd ../
  207  mkdir -p bin/moses-scripts
  208  pwd
  209  ls
  210  cd bin/moses-scripts/
  211  ls
  212  pwd
  213  cd ../
  214  ls
  215  cd ../moses
  216  ls
  217  cd moses/scripts
  218  cd moses
  219  ls
  220  cd ../
  221  cd scripts/
  222  make release
  223  export 
SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses-scripts/scripts-20080811-1801
  224  echo $PATH
  225  echo $MANPATH
  226  echo $LC_NUMERIC
  227  echo $LC_ALL
  228  echo $MACHINE_TYPE
  229  echo $SRILM
  230  SRILM=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm
  231  LC_NUMERIC=C
  232  LC_ALL=C
  233  
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/sw/sbin:/sw/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx
  234  
MANPATH=/sw/share/man:/usr/share/man:/sw/lib/perl5/5.8.6/man://Users/lliohumphreys/MT/MOSESSUITE/srilm
  235  
MANPATH=/sw/share/man:/usr/share/man:/sw/lib/perl5/5.8.6/man:/Users/lliohumphreys/MT/MOSESSUITE/srilm/man
  236  SRILM=Users/lliohumphreys/MT/MOSESSUITE/srilm/
  237  echo $SRILM
  238  echo $MACHINE_TYPE
  239  echo $LC_ALL
  240  echo $MACHINE_TYPE
  241  echo $MANPATH
  242  echo $PATH
  243  EUROPARL=Users/lliohumphreys/MT/Data/europarl
  244  eco $EUROPARL
  245  echo $EUROPARL
  246  cd $EUROPARL
  247  EUROPARL=/Users/lliohumphreys/MT/Data/europarl
  248  cd $EUROPARL
  249  ./sentence-align-corpus.perl en it
  250  ./sentence-align-corpus.perl en it
  251  cat aligned/en-it/en* > corpus/raw.en
  252  cat aligned/en-it/en/* > corpus/raw.en
  253  cat aligned/en-it/it/* > corpus/raw.it
  254  ./sentence-align-corpus.perl it en
  255  cat aligned/it-en/it/* > corpus/raw.it
  256  cat aligned/it-en/en/* > corpus/raw.en
  257  cd ../../MOSESSUITE/moses/
  258  ls scripts
  259  cd $EUROPARL
  260  whereis tokenizer.perl
  261  cd scripts
  262  cd ../../MOSESSUITE/
  263  scripts/tokenizer.perl -1 it < $EUROPARL/corpus/raw.it > 
$EUROPARL/corpus/europarl.tok.it
  264  scripts/tokenizer.perl -l en < $EUROPARL/corpus/raw.en > 
$EUROPARL/corpus/europarl.tok.en
  265  cd bin/moses-scripts
  266  ls
  267  cd scripts-20080811-1801/
  268  ls
  269  cd training/
  270  ls
  271  pwd
  272  cd ../../../
  273  cd ../
  274  bin/moses-scripts/scripts-20080811-1801/training/clean-corpus-n.perl 
$EUROPARL/corpus/europarl.tok en it $EUROPARL/corpus/europarl.clean 1 40
  275  scripts/lowercase.perl < $EUROPARL/corpus/europarl.clean.en > 
$EUROPARL/corpus/europarl.lowercased.en
  276  scripts/lowercase.perl < $EUROPARL/corpus/europarl.clean.it > 
$EUROPARL/corpus/europarl.lowercased.it
  277  mkdir $EUROPARL/lm
  278  scripts/tokenizer.perl -l en < $EUROPARL/corpus/raw.en > 
$EUROPARL/lm/europarl.tok
  279  ls
  280  srilm/bin/macosx -order 5 -interpolate -kndiscount -text 
$EUROPARL/lm/europarl.lowercased -lm $EUROPARL/lm/europarl.lm
  281  srilm/bin/macosx/ngram-count -order 5 -interpolate -kndiscount -text 
$EUROPARL/lm/europarl.lowercased -lm $EUROPARL/lm/europarl.lm
  282  scripts/lowercase.perl < $EUROPARL/lm/europarl.tok> 
$EUROPARL/lm/europarl.lowercased
  283  srilm/bin/macosx/ngram-count -order 5 -interpolate -kndiscount -text 
$EUROPARL/lm/europarl.lowercased -lm $EUROPARL/lm/europarl.lm
  284* 
  285  history > 110808history.txt
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to