Re: [Moses-support] FW: Moses: Prepare Data, Build Language Model and Train Model

Hieu Hoang Fri, 01 Aug 2008 09:59:28 -0700

these are the changes i got from daniel ortiz @ UPV to make it work under
cygwin a few months ago.


i think u'lll also need to fiiddle about with the scripts that uses the
giza++ output to make it work. i don't have the code for that. do a find for
a3.final & A3.final etc.

it may be quicker just to run it on a normal unix machine, rather than a
mac. however, if u managed to sort it out, pls let the mailing list know


-----Original Message-----
From: Llio Humphreys [mailto:[EMAIL PROTECTED] 
Sent: 01 August 2008 17:48
To: Hieu Hoang
Cc: [email protected]
Subject: Re: FW: [Moses-support] Moses: Prepare Data, Build Language Model
and Train Model

Dear Hieu,
this is most useful.  Thank you very much for the lead.  Do you know the
giza program I need to amend?  I take it that the file should not be
overwritten.  Is this the same filename always or does it depend on the
input I give the system?
Many thanks,
Llio Humphreys

On Fri, Aug 1, 2008 at 5:07 PM, Hieu Hoang <[EMAIL PROTECTED]> wrote:
> this may be a smilar problem that was encountered by the UPV guys when 
> running under cygwin
>
> the Mac filesystem is case INSENSITIVE.
>        http://docs.info.apple.com/article.html?artnum=107863
> however, giza++ creates 2 files which have the same name but just 
> different cases, eg
>    blah.a3.final
>    blah.A3.final
> 1 overwrites the other.
>
> you need to change the giza++ code, or run under a case senesitive 
> filesystem. ideally, it should be changed in the trunk giza++ code
>
>
> -----Original Message-----
> From: Josh Schroeder [mailto:[EMAIL PROTECTED]
> Sent: 01 August 2008 16:56
> To: Hieu Hoang
> Subject: Fwd: [Moses-support] Moses: Prepare Data, Build Language 
> Model and Train Model
>
>
>
> Begin forwarded message:
>
>> From: "Llio Humphreys" <[EMAIL PROTECTED]>
>> Date: 25 July 2008 10:00:00 BST
>> To: moses-support <[email protected]>
>> Subject: [Moses-support] Moses: Prepare Data, Build Language Model 
>> and Train Model
>>
>> Please see message without attachment.  Thank you,  Llio Humphreys
>>
>> On Fri, Jul 25, 2008 at 9:50 AM, Llio Humphreys 
>> <[EMAIL PROTECTED]> wrote:
>>> Dear Moses Group,
>>>
>>> I am having difficulties running the Moses software (not the 
>>> recently released version), following the guidelines at 
>>> http://www.statmt.org/wmt07/baseline.html and I attach a record of 
>>> the final part of the terminal session for your information.
>>>
>>> I started with parallel input files, with each line containing one 
>>> sentence, both already tokenised, tab delimited, and in ASCII (is
>>> UTF-8 better?)
>>>
>>> I followed the instructions under the Prepare Data heading.  I 
>>> briefly inspected the .tok output files, and preferred the original 
>>> tokenised version e.g. reference numbers with / were not split up.
>>> So, I renamed the original input files as .tok files, filtered out 
>>> long sentences and lowercased the training data.
>>>
>>> I then proceeded to the Language Model. The instructions seemed 
>>> pretty much the same as for the Prepare Data section, so I moved the 
>>> lowercased files from the corpus directory to the lm directory. Is 
>>> this the right thing to do?
>>>
>>> I then trained the model and the system crashed with the following
>>> message:-
>>>
>>> Executing: bin/moses-scripts/scripts-20080125-1939/training/phrase-
>>> extract/extract
>>> ./model/aligned.0.en ./model/aligned.0.cy 
>>> ./model/aligned.grow-diag-final-and ./model/extract.0-0 7 
>>> orientation PhraseExtract v1.3.0, written by Philipp Koehn phrase 
>>> extraction from an aligned parallel corpus (also extracting 
>>> orientation)
>>> Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
>>> cat: ./model/extract.0-0.o.part*: No such file or directory Exit
>>> code: 1 Died at
>>> bin/moses-scripts/scripts-20080125-1939/training/train-
>>> factored-phrase-model.perl
>>> line 899.
>>>
>>> So, my question is: am I giving Moses the wrong data to work with?
>>>
>>> In order to find out, I downloaded europarl from 
>>> http://www.statmt.org/europarl/.  It contained version 2 rather than 
>>> version 3 but I thought nevertheless that I might try using it.  I 
>>> ran
>>> sentence-align-corpus.perl:
>>>
>>> ./sentence-align-corpus.perl en de
>>>
>>> , but it exited with the following message:
>>>
>>> Died at ./sentence-align-corpus.perl line 16.
>>>
>>> sentence-align-corpus.perl line 16 says:
>>> die unless -e "$dir/$l1";
>>>
>>> Should I continue with europarl 2 or is it possible to download 
>>> europarl 3 from somewhere?
>>>
>>> Alternatively would it be possible for you to explain the difference 
>>> in purpose and format between wmt07/training/europarl-v3.fr-en.fr 
>>> and wmt07/training/europarl-v3.en?  Just to clarify: am I correct in 
>>> saying that the Prepare Data section is about training the 
>>> translation model i.e. word and phrase alignments, and Language 
>>> model section is about creating a language model for the language 
>>> we're translating to?
>>> Does the Prepare Data section start with two plain text parallel 
>>> corpora with sentences on each line or  is something more elaborate 
>>> than that?  Maybe the wmt07/training/europarl-v3.fr-en.fr is a plain 
>>> text file with French sentence 1 followed by English sentence 1 
>>> followed by French sentence 2 followed by English sentence 2 etc?  I 
>>> could then adapt the Welsh-English corpus I'm using accordingly.
>>>
>>> Otherwise, is there a problem with the software/implementation on a 
>>> Mac system? Would you recommend that I try the recently released 
>>> version of Moses?  Is there some way to install the new version of 
>>> Moses without uninstalling the other one (I'm wondering about 
>>> environment variables)
>>>
>>> Thank you,
>>> Llio Humphreys
>>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> The University of Edinburgh is a charitable body, registered in 
> Scotland, with registration number SC005336.
>
>

changes.tar.gz
Description: Binary data

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] FW: Moses: Prepare Data, Build Language Model and Train Model

Reply via email to