Dear Josh,
you may have already received my email on the following problem when
building the language model:
Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
cat: ./model/extract.0-0.o.part*: No such file or directory
Exit code: 1
So, as not to conflate issues, I will ask other questions in this
separate email, and I address them primarily to you because they may
be Mac-related, and you have sucessfully installed Moses on a Mac.
Wee attached my history file of commands entered - it will be clear
that I tried to install this in two separate folders, and the second
installation worked up to a point.
1. You mention that Moses does not use environment variables.
However, in order to get SRILM to work, I found it necessary to create
environment variables and pass these on to SRILM's make:
make SRILM=$PWD MACHINE_TYPE=macosx
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk
MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man LC_NUMERIC=C
In addition, I was also required to type in the following command for
moses-scripts:
export
SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses-scripts/scripts-20080811-1801
If I open a new terminal and echo these variables, most of them are
blank, and PATH just gives the default bin paths.
So, how do I make them permanent? I assume that if I want to use
Moses again, it needs to have access to these variables? How can I
ensure that I can close the terminal, go home, open a new terminal the
next day and get Moses working again? A colleague suggested I update
the .bashrc file to update each new terminal session with these
environment variables. However, my Mac system does not appear to have
a .bashrc system as a default, and when I created one in my home
directory and opened a new terminal, it did not access the .bashrc
file.
2. You say that you ran the decoder on your laptop just fine, but had
to change a few scripts for training. I have very basic knowledge of
Unix systems and installing open-source software: would it be possible
for you to detail the changes you did to the scripts to get it to run
on a Mac? Although I need this information urgently, it may also be
useful for other students who are installing Moses on a Mac and who
may also have basic knowledge of Unix installation procedures.
3. My final question: which is embarrasingly basic...can I use the one
installation of Moses for different corpora, or do I need to do a
separate installation for each one? Can I have separate installations
of SRILM, Giza and mckls, or should they all reference the same
libraries?
Thank you for your help and patience,
Kind regards,
Llio Humphreys
On 7/25/08, Josh Schroeder <[EMAIL PROTECTED]> wrote:
> Hi Llio,
>
> You've got a lot of questions spread around in this message. I'll try to
> get to most of them.
>
>
> >
> > >
> > > Dear Moses Group,
> > >
> > > I am having difficulties running the Moses software (not the recently
> > > released version), following the guidelines at
> > > http://www.statmt.org/wmt07/baseline.html and I attach
> a record of the
> > > final part of the terminal session for your information.
> > >
> > > I started with parallel input files, with each line containing one
> > > sentence, both already tokenised, tab delimited, and in ASCII (is
> > > UTF-8 better?)
> > >
> >
>
> Moses itself is encoding-agnostic - use whatever encoding you want. Some of
> the support scripts on statmt.org (tokenizer.perl, for example) are geared
> to work better with UTF-8. I find UTF-8 a lot easier to use -- especially
> when you start dealing with multiple language pairs with different native
> encodings.
>
>
> >
> > > I followed the instructions under the Prepare Data heading. I briefly
> > > inspected the .tok output files, and preferred the original tokenised
> > > version e.g. reference numbers with / were not split up. So, I
> > > renamed the original input files as .tok files, filtered out long
> > > sentences and lowercased the training data.
> > >
> >
>
> I think you're saying you didn't like the behavior of our sample tokenizer
> with regards to some feature in the training data. If your original files
> are already tokenized in some way, you can just use that data instead of
> re-applying tokenization. Some form of tokenization is definitely important
> though: you don't want "no," "no!" "no." and "no?" to all be treated as
> distinct words instead of multiple instances of the word "no".
>
>
> >
> > > I then proceeded to the Language Model. The instructions seemed pretty
> > > much the same as for the Prepare Data section, so I moved the
> > > lowercased files from the corpus directory to the lm directory. Is
> > > this the right thing to do?
> > >
> >
>
> This is an *acceptable* thing to do, but maybe not the best choice. More
> data for language models is always better. When we make the Europarl data
> parallel for a given language pair, we drop mis-matched sentences,
> paragraphs, even whole documents that don't have a version in both
> languages. In the Prepare Data section, as you mentioned, we filter out long
> sentences. All of that dropped data on the target side can be useful to the
> language model. That's why a non-paired monolingual .en file is used in the
> example, and is only tokenized and lowercased, not filtered for long
> sentences.
>
>
> >
> > > I then trained the model and the system crashed with the following
> message:-
> > >
> > > Executing:
> bin/moses-scripts/scripts-20080125-1939/training/phrase-extract/extract
> > > ./model/aligned.0.en ./model/aligned.0.cy
> > > ./model/aligned.grow-diag-final-and ./model/extract.0-0
> 7 orientation
> > > PhraseExtract v1.3.0, written by Philipp Koehn
> > > phrase extraction from an aligned parallel corpus
> > > (also extracting orientation)
> > > Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
> > > cat: ./model/extract.0-0.o.part*: No such file or directory
> > > Exit code: 1
> > > Died at
> bin/moses-scripts/scripts-20080125-1939/training/train-factored-phrase-model.perl
> > > line 899.
> > >
> > > So, my question is: am I giving Moses the wrong data to work with?
> > >
> >
>
> I think it's more likely that some file is misplaced (you say you 'moved'
> the lowercased files to the lm directory - did you copy them or delete
> them?) or that some part of the
> train-factored-phrase-model.perl process isn't running
> correctly. The full stdout/stderr of the perl script should help you debug
> what is getting done and what is failing. The "Executing:" calls are just
> copies of what is sent to the command line, so you can always try copy and
> pasting that and running it yourself outside of the perl script to debug
> what's going wrong. You've got the perl script, too, so poke around inside
> it and figure out what it's doing. That's the beauty of open-source. :)
>
>
> >
> > > In order to find out, I downloaded europarl from
> > > http://www.statmt.org/europarl/. It contained version
> 2 rather than
> > > version 3 but I thought nevertheless that I might try using it. I ran
> > > sentence-align-corpus.perl:
> > >
> >
>
> The downloads from that page contain version 3, not v2. What made you think
> it was version 2? Maybe we missed a readme somewhere, but the data is v3 for
> sure.
>
>
> >
> > > ./sentence-align-corpus.perl en de
> > >
> > > , but it exited with the following message:
> > >
> > > Died at ./sentence-align-corpus.perl line 16.
> > >
> > > sentence-align-corpus.perl line 16 says:
> > > die unless -e "$dir/$l1";
> > >
> >
>
> Yeah, there was a bug in sentence-align-corpus. Line 9 should read
>
> my $dir = "txt";
>
> It was looking in the wrong directory. You can either fix your version or
> re-download the tools.tgz file from the Europarl page.
>
>
> >
> > > Should I continue with europarl 2 or is it possible to download
> > > europarl 3 from somewhere?
> > >
> >
>
> See above. v3 is what is available. v2 is available in an archive page at
> <http://www.statmt.org/europarl/archives.html>
>
>
> >
> > > Alternatively would it be possible for you to explain the difference
> > > in purpose and format between
> wmt07/training/europarl-v3.fr-en.fr and
> > > wmt07/training/europarl-v3.en?
> > >
> >
>
> You can get the files that tutorial is talking about from
> <http://www.statmt.org/wmt07/shared-task.html#download> and
> look through them yourself. The europarl-v3.fr-en.* files come in a pair.
> There should be europarl-v3.fr-en.en and europarl-v3.fr-en.fr. All 3 files
> have one sentence per line, europarl-v3.fr-en.en and europarl-v3.fr-en.fr
> have an identical number of lines, and europarl-v3.en has a superset of the
> europarl-v3.fr-en.en data. Expanding on what I said about LM data above,
> more data can go into the non-paired file because we don't have to match
> documents across two languages. We need paired data for word alignments, but
> any monolingual target data is useful for language modeling.
>
>
> >
> > > Just to clarify: am I correct in
> > > saying that the Prepare Data section is about training the translation
> > > model i.e. word and phrase alignments, and Language model section is
> > > about creating a language model for the language we're translating to?
> > >
> >
>
> Correct.
>
>
> >
> > > Does the Prepare Data section start with two plain text parallel
> > > corpora with sentences on each line or is something more elaborate
> > > than that? Maybe the
> wmt07/training/europarl-v3.fr-en.fr is a plain
> > > text file with French sentence 1 followed by English sentence 1
> > > followed by French sentence 2 followed by English sentence 2 etc? I
> > > could then adapt the Welsh-English corpus I'm using accordingly.
> > >
> >
>
> These paired files should have exactly the same number of lines. Line 1 in
> .en and Line 1 in .fr should be the same sentence, one file in English and
> one in French. These are the results of running sentence-align-corpus,
> combining all the files for each language, and filtering out the lines with
> XML tags. If you want to play with prepared files and not "roll your own"
> from the Europarl data, check out the wmt07 and wmt08 websites for
> downloadable monolingual and parallel training data.
>
>
> >
> > > Otherwise, is there a problem with the software/implementation on a
> > > Mac system? Would you recommend that I try the recently released
> > > version of Moses? Is there some way to install the new version of
> > > Moses without uninstalling the other one (I'm wondering about
> > > environment variables)
> > >
> >
>
> I've run the decoder on my mac laptop just fine. You may have to change a
> few scripts for training - for example, I know the mac uses 'gzcat' instead
> of 'zcat'. Moses doesn't use environment variables. Compile it in a
> different directory and you've got a second copy!
>
>
> Good luck!
>
> Josh
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
1 GCC
2 gcc --version
3 wish
4 gcc --version
5 cd MTRESEARCH/MOSES08/srilm
6 pwd
7 gnumake World
8
PATH=$PATH:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin
9 echo $PATH
10 MANPATH=Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
11 echo $MANPATH
12 cd test
13 gnumake all
14 cd ../
15 echo $SRILM
16 echo $MACHINE_TYPE
17 pwd
18 echo $PWD
19 make SRILM=$PWD MACHINE_TYPE=macosx
20 cd test
21 gnumake all
22 gnumake all SRILM=$PWD MACHINE_TYPE=macosx
23 cd ../
24 make clean
25 gnumake cleanest
26 echo SRILM
27 echo $SRILM
28 SRILM=$PWD
29 echo $SRILM
30 MACHINE_TYPE=macosx
31 echo $MACHINE_TYPE
32 make SRILM=$PWD MACHINE_TYPE=macosx
33 cd test
34 gnumake all SRILM=$PWD MACHINE_TYPE=macosx
35 ngram -version
36 cd ../
37 ngram -version
38 echo PATH
39 echo $PATH
40 gawk --version
41 awk --version
42 awk -W version
43 awk version
44 awk
45 awk -v
46 man awk
47 PATH=$PATH:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
48 echo $PATH
49 echo $MANPATH
50 make SRILM=$PWD
PATH=bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
MANPATH=Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
51 make clean SRILM=$PWD
PATH=bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
MANPATH=Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
52 make clean SRILM=$PWD
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
53 make SRILM=$PWD
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
54 cd test
55 make SRILM=$PWD
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
56 cd ../
57 make clean
58 make SRILM=$PWD
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx
MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
59 make SRILM=$PWD MACHINE_TYPE=macosx
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx:/sw/bin/gawk
MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/man
60 echo $PATH
61 echo $MANPATH
62 echo $SRILM
63*
64 cd test
65 echo $SRILM
66 echo $MANPATH
67 echo $PATH
68 make all
69 cd ../../../../
70 cd V3MTRESEARCH/MOSESSUITE/srilm
71 make SRILM=$PWD MACHINE_TYPE=macosx
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin:/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm/bin/macosx:/sw/bin/gawk
MANPATH=/Users/lliohumphreys/MTRESEARCH/MOSESSUITE/srilm/man
72 cd ../../../../
73 ls
74 cd Users/lliohumphreys/
75 ls
76 cd MT/MOSESSUITE/srilm
77 ls
78 make SRILM=$PWD MACHINE_TYPE=macosx
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx:/sw/bin/gawk
MANPATH=/Users/lliohumphreys/MT/MOSESSUITE/srilm/man LC_NUMERIC=C
79 cd test
80 gnumake all
81 cd ../
82 gnumake cleanest
83 cd ../
84 ls
85 cd giza-pp
86 make all
87 echo $PATH
88 PATH=$PATH:/opt:/sw
89 echo $PATH
90 make clean
91 make all
92 sudo apt-get install crt0.o
93 make clean
94 make all --static-lib
95 ./configure help
96 ./configure --help
97 make --help
98 man pushd
99 ls
100 cd ..
101 ls
102 cd ..
103 ls
104 cd ..
105 ls
106 cd Desktop/cs
107 cd Desktop/
108 ls
109 cd Csu-45
110 ls
111 mkdir -p build/csu
112 ls
113 pushd build/csu/
114 ls
115 ls
116 cd ../..
117 ls
118 find . -name configure
119 cd ..
120 ls
121 ls
122 tar -xvf Csu-45.tar ./test
123 man tar
124 cd Csu-45
125 ls
126 make
127 ls
128 cd ../../../../../
129 ls
130 cd usr;
131 ls
132 man indr
133 cd ..
134 locate indr
135 find /usr/ -name indr
136 ld
137 ls
138 cd Users/lliohumphreys/MT
139 cd MOSESSUITE/;ls
140 cd giza-pp
141 ls
142 cat README
143 ls
144 vim README
145 vim Makefile
146 cd GIZA++-v2/
147 ls
148 cat dependencies
149 ls
150 mke
151 make
152 vim Parameter.
153 vim Parameter.h
154 cd ..
155 vim Makefile
156 ls
157 cd GIZA++-v2/
158 ls
159 ls optimized/
160 cd ..
161 cat Makefile
162 ls
163 cd GIZA++-v2/
164 ls
165 g++ -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o
optimized/getSentence.o optimized/TTables.o optimized/ATables.o
optimized/AlignTables.o optimized/main.o optimized/NTables.o
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o
optimized/ForwardBackward.o -o GIZA++
166 ls
167 ls optimized/
168 ls
169 g++ -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o
optimized/getSentence.o optimized/TTables.o optimized/ATables.o
optimized/AlignTables.o optimized/main.o optimized/NTables.o
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o
optimized/ForwardBackward.o -o GIZA++
170 ls GIZA++ -l
171 ls -l GIZA++
172 g++ -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o
optimized/getSentence.o optimized/TTables.o optimized/ATables.o
optimized/AlignTables.o optimized/main.o optimized/NTables.o
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o
optimized/ForwardBackward.o -static -o GIZA++
173 g++ -Wall -W -Wno-deprecated -O3 -DNDEBUG -DWORDINDEX_WITH_4_BYTE
-DBINARY_SEARCH_FOR_TTABLE optimized/Parameter.o optimized/myassert.o
optimized/Perplexity.o optimized/model1.o optimized/model2.o optimized/model3.o
optimized/getSentence.o optimized/TTables.o optimized/ATables.o
optimized/AlignTables.o optimized/main.o optimized/NTables.o
optimized/model2to3.o optimized/collCounts.o optimized/alignment.o
optimized/vocab.o optimized/MoveSwapMatrix.o optimized/transpair_model3.o
optimized/transpair_model5.o optimized/transpair_model4.o optimized/utility.o
optimized/parse.o optimized/reports.o optimized/model3_viterbi.o
optimized/model3_viterbi_with_tricks.o optimized/Dictionary.o
optimized/model345-peg.o optimized/hmm.o optimized/HMMTables.o
optimized/ForwardBackward.o -o GIZA++
174 ./GIZA++
175 cd ../
176 make mkcls-v2
177 cd GIZA++-v2/
178 make snt2cooc.out
179 cd ../
180 cp GIZA++-v2/GIZA++ bin/
181 mkdir -p bin
182 cp GIZA++-v2/GIZA++ bin/
183 cp GIZA++-v2/snt2cooc.out bin/
184* cp giza-pp/mkcls-v2/mkcls bin/
185 cd bin
186 ls
187 cd ../
188 mkdir -p moses
189 svn co
https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk moses
190 cd ../
191 mkdir -p bin
192 cp giza-pp/GIZA++-v2/GIZA++ bin/
193 cp giza-pp/mkcls-v2/mkcls bin/
194 cp giza-pp/GIZA++-v2/snt2cooc.out bin/
195 cd moses
196 mkdir -p moses
197 cd moses
198 ./regenerate-makefiles.sh
199 touch *
200 ./regenerate-makefiles.sh
201 echo PATH
202 echo $PATH
203 ./configure --with-srilm=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm
204 make
205 mkdir -p bin/moses-scripts
206 cd ../
207 mkdir -p bin/moses-scripts
208 pwd
209 ls
210 cd bin/moses-scripts/
211 ls
212 pwd
213 cd ../
214 ls
215 cd ../moses
216 ls
217 cd moses/scripts
218 cd moses
219 ls
220 cd ../
221 cd scripts/
222 make release
223 export
SCRIPTS_ROOTDIR=/Users/lliohumphreys/MT/MOSESSUITE/bin/moses-scripts/scripts-20080811-1801
224 echo $PATH
225 echo $MANPATH
226 echo $LC_NUMERIC
227 echo $LC_ALL
228 echo $MACHINE_TYPE
229 echo $SRILM
230 SRILM=/Users/lliohumphreys/MTRESEARCH/MOSES08/srilm
231 LC_NUMERIC=C
232 LC_ALL=C
233
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/sw/sbin:/sw/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin:/Users/lliohumphreys/MT/MOSESSUITE/srilm/bin/macosx
234
MANPATH=/sw/share/man:/usr/share/man:/sw/lib/perl5/5.8.6/man://Users/lliohumphreys/MT/MOSESSUITE/srilm
235
MANPATH=/sw/share/man:/usr/share/man:/sw/lib/perl5/5.8.6/man:/Users/lliohumphreys/MT/MOSESSUITE/srilm/man
236 SRILM=Users/lliohumphreys/MT/MOSESSUITE/srilm/
237 echo $SRILM
238 echo $MACHINE_TYPE
239 echo $LC_ALL
240 echo $MACHINE_TYPE
241 echo $MANPATH
242 echo $PATH
243 EUROPARL=Users/lliohumphreys/MT/Data/europarl
244 eco $EUROPARL
245 echo $EUROPARL
246 cd $EUROPARL
247 EUROPARL=/Users/lliohumphreys/MT/Data/europarl
248 cd $EUROPARL
249 ./sentence-align-corpus.perl en it
250 ./sentence-align-corpus.perl en it
251 cat aligned/en-it/en* > corpus/raw.en
252 cat aligned/en-it/en/* > corpus/raw.en
253 cat aligned/en-it/it/* > corpus/raw.it
254 ./sentence-align-corpus.perl it en
255 cat aligned/it-en/it/* > corpus/raw.it
256 cat aligned/it-en/en/* > corpus/raw.en
257 cd ../../MOSESSUITE/moses/
258 ls scripts
259 cd $EUROPARL
260 whereis tokenizer.perl
261 cd scripts
262 cd ../../MOSESSUITE/
263 scripts/tokenizer.perl -1 it < $EUROPARL/corpus/raw.it >
$EUROPARL/corpus/europarl.tok.it
264 scripts/tokenizer.perl -l en < $EUROPARL/corpus/raw.en >
$EUROPARL/corpus/europarl.tok.en
265 cd bin/moses-scripts
266 ls
267 cd scripts-20080811-1801/
268 ls
269 cd training/
270 ls
271 pwd
272 cd ../../../
273 cd ../
274 bin/moses-scripts/scripts-20080811-1801/training/clean-corpus-n.perl
$EUROPARL/corpus/europarl.tok en it $EUROPARL/corpus/europarl.clean 1 40
275 scripts/lowercase.perl < $EUROPARL/corpus/europarl.clean.en >
$EUROPARL/corpus/europarl.lowercased.en
276 scripts/lowercase.perl < $EUROPARL/corpus/europarl.clean.it >
$EUROPARL/corpus/europarl.lowercased.it
277 mkdir $EUROPARL/lm
278 scripts/tokenizer.perl -l en < $EUROPARL/corpus/raw.en >
$EUROPARL/lm/europarl.tok
279 ls
280 srilm/bin/macosx -order 5 -interpolate -kndiscount -text
$EUROPARL/lm/europarl.lowercased -lm $EUROPARL/lm/europarl.lm
281 srilm/bin/macosx/ngram-count -order 5 -interpolate -kndiscount -text
$EUROPARL/lm/europarl.lowercased -lm $EUROPARL/lm/europarl.lm
282 scripts/lowercase.perl < $EUROPARL/lm/europarl.tok>
$EUROPARL/lm/europarl.lowercased
283 srilm/bin/macosx/ngram-count -order 5 -interpolate -kndiscount -text
$EUROPARL/lm/europarl.lowercased -lm $EUROPARL/lm/europarl.lm
284*
285 history > 110808history.txt
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support