Re: [Moses-support] FW: Moses: Prepare Data, Build Language Model and Train Model

Llio Humphreys Fri, 01 Aug 2008 10:20:55 -0700

Dear Hieu,
many thanks for the files.  I'm pleased that my efforts installing
Moses on a Mac are of interest to the mailing list :-) I take it that
I just replace main.cpp and model3.cpp in the giza folder?  I'm not
familiar with installing software, but I imagine that I will then need
to run giza's make file again?  Does this mean I need to re-install
Moses, Moses scripts, and the additional scripts as well?  I have
recently downloaded but not yet installed a more recent version of
Moses.  Are these problems resolved in the later version, or do they
still use the same version of Giza, so that I will still need to
replace main.cpp and model3.cpp?
Thanks,
Llio


On Fri, Aug 1, 2008 at 5:58 PM, Hieu Hoang <[EMAIL PROTECTED]> wrote:
> these are the changes i got from daniel ortiz @ UPV to make it work under
> cygwin a few months ago.
>
> i think u'lll also need to fiiddle about with the scripts that uses the
> giza++ output to make it work. i don't have the code for that. do a find for
> a3.final & A3.final etc.
>
> it may be quicker just to run it on a normal unix machine, rather than a
> mac. however, if u managed to sort it out, pls let the mailing list know
>
>
> -----Original Message-----
> From: Llio Humphreys [mailto:[EMAIL PROTECTED]
> Sent: 01 August 2008 17:48
> To: Hieu Hoang
> Cc: [email protected]
> Subject: Re: FW: [Moses-support] Moses: Prepare Data, Build Language Model
> and Train Model
>
> Dear Hieu,
> this is most useful.  Thank you very much for the lead.  Do you know the
> giza program I need to amend?  I take it that the file should not be
> overwritten.  Is this the same filename always or does it depend on the
> input I give the system?
> Many thanks,
> Llio Humphreys
>
> On Fri, Aug 1, 2008 at 5:07 PM, Hieu Hoang <[EMAIL PROTECTED]> wrote:
>> this may be a smilar problem that was encountered by the UPV guys when
>> running under cygwin
>>
>> the Mac filesystem is case INSENSITIVE.
>>        http://docs.info.apple.com/article.html?artnum=107863
>> however, giza++ creates 2 files which have the same name but just
>> different cases, eg
>>    blah.a3.final
>>    blah.A3.final
>> 1 overwrites the other.
>>
>> you need to change the giza++ code, or run under a case senesitive
>> filesystem. ideally, it should be changed in the trunk giza++ code
>>
>>
>> -----Original Message-----
>> From: Josh Schroeder [mailto:[EMAIL PROTECTED]
>> Sent: 01 August 2008 16:56
>> To: Hieu Hoang
>> Subject: Fwd: [Moses-support] Moses: Prepare Data, Build Language
>> Model and Train Model
>>
>>
>>
>> Begin forwarded message:
>>
>>> From: "Llio Humphreys" <[EMAIL PROTECTED]>
>>> Date: 25 July 2008 10:00:00 BST
>>> To: moses-support <[email protected]>
>>> Subject: [Moses-support] Moses: Prepare Data, Build Language Model
>>> and Train Model
>>>
>>> Please see message without attachment.  Thank you,  Llio Humphreys
>>>
>>> On Fri, Jul 25, 2008 at 9:50 AM, Llio Humphreys
>>> <[EMAIL PROTECTED]> wrote:
>>>> Dear Moses Group,
>>>>
>>>> I am having difficulties running the Moses software (not the
>>>> recently released version), following the guidelines at
>>>> http://www.statmt.org/wmt07/baseline.html and I attach a record of
>>>> the final part of the terminal session for your information.
>>>>
>>>> I started with parallel input files, with each line containing one
>>>> sentence, both already tokenised, tab delimited, and in ASCII (is
>>>> UTF-8 better?)
>>>>
>>>> I followed the instructions under the Prepare Data heading.  I
>>>> briefly inspected the .tok output files, and preferred the original
>>>> tokenised version e.g. reference numbers with / were not split up.
>>>> So, I renamed the original input files as .tok files, filtered out
>>>> long sentences and lowercased the training data.
>>>>
>>>> I then proceeded to the Language Model. The instructions seemed
>>>> pretty much the same as for the Prepare Data section, so I moved the
>>>> lowercased files from the corpus directory to the lm directory. Is
>>>> this the right thing to do?
>>>>
>>>> I then trained the model and the system crashed with the following
>>>> message:-
>>>>
>>>> Executing: bin/moses-scripts/scripts-20080125-1939/training/phrase-
>>>> extract/extract
>>>> ./model/aligned.0.en ./model/aligned.0.cy
>>>> ./model/aligned.grow-diag-final-and ./model/extract.0-0 7
>>>> orientation PhraseExtract v1.3.0, written by Philipp Koehn phrase
>>>> extraction from an aligned parallel corpus (also extracting
>>>> orientation)
>>>> Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o
>>>> cat: ./model/extract.0-0.o.part*: No such file or directory Exit
>>>> code: 1 Died at
>>>> bin/moses-scripts/scripts-20080125-1939/training/train-
>>>> factored-phrase-model.perl
>>>> line 899.
>>>>
>>>> So, my question is: am I giving Moses the wrong data to work with?
>>>>
>>>> In order to find out, I downloaded europarl from
>>>> http://www.statmt.org/europarl/.  It contained version 2 rather than
>>>> version 3 but I thought nevertheless that I might try using it.  I
>>>> ran
>>>> sentence-align-corpus.perl:
>>>>
>>>> ./sentence-align-corpus.perl en de
>>>>
>>>> , but it exited with the following message:
>>>>
>>>> Died at ./sentence-align-corpus.perl line 16.
>>>>
>>>> sentence-align-corpus.perl line 16 says:
>>>> die unless -e "$dir/$l1";
>>>>
>>>> Should I continue with europarl 2 or is it possible to download
>>>> europarl 3 from somewhere?
>>>>
>>>> Alternatively would it be possible for you to explain the difference
>>>> in purpose and format between wmt07/training/europarl-v3.fr-en.fr
>>>> and wmt07/training/europarl-v3.en?  Just to clarify: am I correct in
>>>> saying that the Prepare Data section is about training the
>>>> translation model i.e. word and phrase alignments, and Language
>>>> model section is about creating a language model for the language
>>>> we're translating to?
>>>> Does the Prepare Data section start with two plain text parallel
>>>> corpora with sentences on each line or  is something more elaborate
>>>> than that?  Maybe the wmt07/training/europarl-v3.fr-en.fr is a plain
>>>> text file with French sentence 1 followed by English sentence 1
>>>> followed by French sentence 2 followed by English sentence 2 etc?  I
>>>> could then adapt the Welsh-English corpus I'm using accordingly.
>>>>
>>>> Otherwise, is there a problem with the software/implementation on a
>>>> Mac system? Would you recommend that I try the recently released
>>>> version of Moses?  Is there some way to install the new version of
>>>> Moses without uninstalling the other one (I'm wondering about
>>>> environment variables)
>>>>
>>>> Thank you,
>>>> Llio Humphreys
>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] FW: Moses: Prepare Data, Build Language Model and Train Model

Reply via email to