[Moses-support] Moses: Prepare Data, Build Language Model and Train Model

Llio Humphreys Fri, 25 Jul 2008 02:00:53 -0700

Please see message without attachment.  Thank you,  Llio Humphreys


On Fri, Jul 25, 2008 at 9:50 AM, Llio Humphreys <[EMAIL PROTECTED]> wrote:
> Dear Moses Group,
>
> I am having difficulties running the Moses software (not the recently
> released version), following the guidelines at
> http://www.statmt.org/wmt07/baseline.html and I attach a record of the
> final part of the terminal session for your information.
>
> I started with parallel input files, with each line containing one
> sentence, both already tokenised, tab delimited, and in ASCII (is
> UTF-8 better?)
>
> I followed the instructions under the Prepare Data heading.  I briefly
> inspected the .tok output files, and preferred the original tokenised
> version e.g. reference numbers with / were not split up.  So, I
> renamed the original input files as .tok files, filtered out long
> sentences and lowercased the training data.
>
> I then proceeded to the Language Model. The instructions seemed pretty
> much the same as for the Prepare Data section, so I moved the
> lowercased files from the corpus directory to the lm directory. Is
> this the right thing to do?
>
> I then trained the model and the system crashed with the following message:-
>
> Executing: 
> bin/moses-scripts/scripts-20080125-1939/training/phrase-extract/extract
> ./model/aligned.0.en ./model/aligned.0.cy
> ./model/aligned.grow-diag-final-and ./model/extract.0-0 7 orientation
> PhraseExtract v1.3.0, written by Philipp Koehn
> phrase extraction from an aligned parallel corpus
> (also extracting orientation)
> Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
> cat: ./model/extract.0-0.o.part*: No such file or directory
> Exit code: 1
> Died at 
> bin/moses-scripts/scripts-20080125-1939/training/train-factored-phrase-model.perl
> line 899.
>
> So, my question is: am I giving Moses the wrong data to work with?
>
> In order to find out, I downloaded europarl from
> http://www.statmt.org/europarl/.  It contained version 2 rather than
> version 3 but I thought nevertheless that I might try using it.  I ran
> sentence-align-corpus.perl:
>
> ./sentence-align-corpus.perl en de
>
> , but it exited with the following message:
>
> Died at ./sentence-align-corpus.perl line 16.
>
> sentence-align-corpus.perl line 16 says:
> die unless -e "$dir/$l1";
>
> Should I continue with europarl 2 or is it possible to download
> europarl 3 from somewhere?
>
> Alternatively would it be possible for you to explain the difference
> in purpose and format between wmt07/training/europarl-v3.fr-en.fr and
> wmt07/training/europarl-v3.en?  Just to clarify: am I correct in
> saying that the Prepare Data section is about training the translation
> model i.e. word and phrase alignments, and Language model section is
> about creating a language model for the language we're translating to?
> Does the Prepare Data section start with two plain text parallel
> corpora with sentences on each line or  is something more elaborate
> than that?  Maybe the wmt07/training/europarl-v3.fr-en.fr is a plain
> text file with French sentence 1 followed by English sentence 1
> followed by French sentence 2 followed by English sentence 2 etc?  I
> could then adapt the Welsh-English corpus I'm using accordingly.
>
> Otherwise, is there a problem with the software/implementation on a
> Mac system? Would you recommend that I try the recently released
> version of Moses?  Is there some way to install the new version of
> Moses without uninstalling the other one (I'm wondering about
> environment variables)
>
> Thank you,
> Llio Humphreys
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Moses: Prepare Data, Build Language Model and Train Model

Reply via email to