[Moses-support] FW: Moses: Prepare Data, Build Language Model and Train Model

Hieu Hoang Fri, 01 Aug 2008 09:08:05 -0700

this may be a smilar problem that was encountered by the UPV guys when
running under cygwin


the Mac filesystem is case INSENSITIVE.
        http://docs.info.apple.com/article.html?artnum=107863
however, giza++ creates 2 files which have the same name but just different
cases, eg
    blah.a3.final
    blah.A3.final
1 overwrites the other.

you need to change the giza++ code, or run under a case senesitive
filesystem. ideally, it should be changed in the trunk giza++ code


-----Original Message-----
From: Josh Schroeder [mailto:[EMAIL PROTECTED] 
Sent: 01 August 2008 16:56
To: Hieu Hoang
Subject: Fwd: [Moses-support] Moses: Prepare Data, Build Language Model and
Train Model



Begin forwarded message:

> From: "Llio Humphreys" <[EMAIL PROTECTED]>
> Date: 25 July 2008 10:00:00 BST
> To: moses-support <[email protected]>
> Subject: [Moses-support] Moses: Prepare Data, Build Language Model and 
> Train Model
>
> Please see message without attachment.  Thank you,  Llio Humphreys
>
> On Fri, Jul 25, 2008 at 9:50 AM, Llio Humphreys 
> <[EMAIL PROTECTED]> wrote:
>> Dear Moses Group,
>>
>> I am having difficulties running the Moses software (not the recently 
>> released version), following the guidelines at 
>> http://www.statmt.org/wmt07/baseline.html and I attach a record of 
>> the final part of the terminal session for your information.
>>
>> I started with parallel input files, with each line containing one 
>> sentence, both already tokenised, tab delimited, and in ASCII (is
>> UTF-8 better?)
>>
>> I followed the instructions under the Prepare Data heading.  I 
>> briefly inspected the .tok output files, and preferred the original 
>> tokenised version e.g. reference numbers with / were not split up.  
>> So, I renamed the original input files as .tok files, filtered out 
>> long sentences and lowercased the training data.
>>
>> I then proceeded to the Language Model. The instructions seemed 
>> pretty much the same as for the Prepare Data section, so I moved the 
>> lowercased files from the corpus directory to the lm directory. Is 
>> this the right thing to do?
>>
>> I then trained the model and the system crashed with the following 
>> message:-
>>
>> Executing: bin/moses-scripts/scripts-20080125-1939/training/phrase-
>> extract/extract
>> ./model/aligned.0.en ./model/aligned.0.cy 
>> ./model/aligned.grow-diag-final-and ./model/extract.0-0 7 orientation 
>> PhraseExtract v1.3.0, written by Philipp Koehn phrase extraction from 
>> an aligned parallel corpus (also extracting orientation)
>> Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
>> cat: ./model/extract.0-0.o.part*: No such file or directory Exit 
>> code: 1 Died at 
>> bin/moses-scripts/scripts-20080125-1939/training/train-
>> factored-phrase-model.perl
>> line 899.
>>
>> So, my question is: am I giving Moses the wrong data to work with?
>>
>> In order to find out, I downloaded europarl from 
>> http://www.statmt.org/europarl/.  It contained version 2 rather than 
>> version 3 but I thought nevertheless that I might try using it.  I 
>> ran
>> sentence-align-corpus.perl:
>>
>> ./sentence-align-corpus.perl en de
>>
>> , but it exited with the following message:
>>
>> Died at ./sentence-align-corpus.perl line 16.
>>
>> sentence-align-corpus.perl line 16 says:
>> die unless -e "$dir/$l1";
>>
>> Should I continue with europarl 2 or is it possible to download 
>> europarl 3 from somewhere?
>>
>> Alternatively would it be possible for you to explain the difference 
>> in purpose and format between wmt07/training/europarl-v3.fr-en.fr and 
>> wmt07/training/europarl-v3.en?  Just to clarify: am I correct in 
>> saying that the Prepare Data section is about training the 
>> translation model i.e. word and phrase alignments, and Language model 
>> section is about creating a language model for the language we're 
>> translating to?
>> Does the Prepare Data section start with two plain text parallel 
>> corpora with sentences on each line or  is something more elaborate 
>> than that?  Maybe the wmt07/training/europarl-v3.fr-en.fr is a plain 
>> text file with French sentence 1 followed by English sentence 1 
>> followed by French sentence 2 followed by English sentence 2 etc?  I 
>> could then adapt the Welsh-English corpus I'm using accordingly.
>>
>> Otherwise, is there a problem with the software/implementation on a 
>> Mac system? Would you recommend that I try the recently released 
>> version of Moses?  Is there some way to install the new version of 
>> Moses without uninstalling the other one (I'm wondering about 
>> environment variables)
>>
>> Thank you,
>> Llio Humphreys
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] FW: Moses: Prepare Data, Build Language Model and Train Model

Reply via email to