Please see message without attachment. Thank you, Llio Humphreys
On Fri, Jul 25, 2008 at 9:50 AM, Llio Humphreys <[EMAIL PROTECTED]> wrote: > Dear Moses Group, > > I am having difficulties running the Moses software (not the recently > released version), following the guidelines at > http://www.statmt.org/wmt07/baseline.html and I attach a record of the > final part of the terminal session for your information. > > I started with parallel input files, with each line containing one > sentence, both already tokenised, tab delimited, and in ASCII (is > UTF-8 better?) > > I followed the instructions under the Prepare Data heading. I briefly > inspected the .tok output files, and preferred the original tokenised > version e.g. reference numbers with / were not split up. So, I > renamed the original input files as .tok files, filtered out long > sentences and lowercased the training data. > > I then proceeded to the Language Model. The instructions seemed pretty > much the same as for the Prepare Data section, so I moved the > lowercased files from the corpus directory to the lm directory. Is > this the right thing to do? > > I then trained the model and the system crashed with the following message:- > > Executing: > bin/moses-scripts/scripts-20080125-1939/training/phrase-extract/extract > ./model/aligned.0.en ./model/aligned.0.cy > ./model/aligned.grow-diag-final-and ./model/extract.0-0 7 orientation > PhraseExtract v1.3.0, written by Philipp Koehn > phrase extraction from an aligned parallel corpus > (also extracting orientation) > Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o > cat: ./model/extract.0-0.o.part*: No such file or directory > Exit code: 1 > Died at > bin/moses-scripts/scripts-20080125-1939/training/train-factored-phrase-model.perl > line 899. > > So, my question is: am I giving Moses the wrong data to work with? > > In order to find out, I downloaded europarl from > http://www.statmt.org/europarl/. It contained version 2 rather than > version 3 but I thought nevertheless that I might try using it. I ran > sentence-align-corpus.perl: > > ./sentence-align-corpus.perl en de > > , but it exited with the following message: > > Died at ./sentence-align-corpus.perl line 16. > > sentence-align-corpus.perl line 16 says: > die unless -e "$dir/$l1"; > > Should I continue with europarl 2 or is it possible to download > europarl 3 from somewhere? > > Alternatively would it be possible for you to explain the difference > in purpose and format between wmt07/training/europarl-v3.fr-en.fr and > wmt07/training/europarl-v3.en? Just to clarify: am I correct in > saying that the Prepare Data section is about training the translation > model i.e. word and phrase alignments, and Language model section is > about creating a language model for the language we're translating to? > Does the Prepare Data section start with two plain text parallel > corpora with sentences on each line or is something more elaborate > than that? Maybe the wmt07/training/europarl-v3.fr-en.fr is a plain > text file with French sentence 1 followed by English sentence 1 > followed by French sentence 2 followed by English sentence 2 etc? I > could then adapt the Welsh-English corpus I'm using accordingly. > > Otherwise, is there a problem with the software/implementation on a > Mac system? Would you recommend that I try the recently released > version of Moses? Is there some way to install the new version of > Moses without uninstalling the other one (I'm wondering about > environment variables) > > Thank you, > Llio Humphreys > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
