Dear Hieu, many thanks for the files. I'm pleased that my efforts installing Moses on a Mac are of interest to the mailing list :-) I take it that I just replace main.cpp and model3.cpp in the giza folder? I'm not familiar with installing software, but I imagine that I will then need to run giza's make file again? Does this mean I need to re-install Moses, Moses scripts, and the additional scripts as well? I have recently downloaded but not yet installed a more recent version of Moses. Are these problems resolved in the later version, or do they still use the same version of Giza, so that I will still need to replace main.cpp and model3.cpp? Thanks, Llio
On Fri, Aug 1, 2008 at 5:58 PM, Hieu Hoang <[EMAIL PROTECTED]> wrote: > these are the changes i got from daniel ortiz @ UPV to make it work under > cygwin a few months ago. > > i think u'lll also need to fiiddle about with the scripts that uses the > giza++ output to make it work. i don't have the code for that. do a find for > a3.final & A3.final etc. > > it may be quicker just to run it on a normal unix machine, rather than a > mac. however, if u managed to sort it out, pls let the mailing list know > > > -----Original Message----- > From: Llio Humphreys [mailto:[EMAIL PROTECTED] > Sent: 01 August 2008 17:48 > To: Hieu Hoang > Cc: [email protected] > Subject: Re: FW: [Moses-support] Moses: Prepare Data, Build Language Model > and Train Model > > Dear Hieu, > this is most useful. Thank you very much for the lead. Do you know the > giza program I need to amend? I take it that the file should not be > overwritten. Is this the same filename always or does it depend on the > input I give the system? > Many thanks, > Llio Humphreys > > On Fri, Aug 1, 2008 at 5:07 PM, Hieu Hoang <[EMAIL PROTECTED]> wrote: >> this may be a smilar problem that was encountered by the UPV guys when >> running under cygwin >> >> the Mac filesystem is case INSENSITIVE. >> http://docs.info.apple.com/article.html?artnum=107863 >> however, giza++ creates 2 files which have the same name but just >> different cases, eg >> blah.a3.final >> blah.A3.final >> 1 overwrites the other. >> >> you need to change the giza++ code, or run under a case senesitive >> filesystem. ideally, it should be changed in the trunk giza++ code >> >> >> -----Original Message----- >> From: Josh Schroeder [mailto:[EMAIL PROTECTED] >> Sent: 01 August 2008 16:56 >> To: Hieu Hoang >> Subject: Fwd: [Moses-support] Moses: Prepare Data, Build Language >> Model and Train Model >> >> >> >> Begin forwarded message: >> >>> From: "Llio Humphreys" <[EMAIL PROTECTED]> >>> Date: 25 July 2008 10:00:00 BST >>> To: moses-support <[email protected]> >>> Subject: [Moses-support] Moses: Prepare Data, Build Language Model >>> and Train Model >>> >>> Please see message without attachment. Thank you, Llio Humphreys >>> >>> On Fri, Jul 25, 2008 at 9:50 AM, Llio Humphreys >>> <[EMAIL PROTECTED]> wrote: >>>> Dear Moses Group, >>>> >>>> I am having difficulties running the Moses software (not the >>>> recently released version), following the guidelines at >>>> http://www.statmt.org/wmt07/baseline.html and I attach a record of >>>> the final part of the terminal session for your information. >>>> >>>> I started with parallel input files, with each line containing one >>>> sentence, both already tokenised, tab delimited, and in ASCII (is >>>> UTF-8 better?) >>>> >>>> I followed the instructions under the Prepare Data heading. I >>>> briefly inspected the .tok output files, and preferred the original >>>> tokenised version e.g. reference numbers with / were not split up. >>>> So, I renamed the original input files as .tok files, filtered out >>>> long sentences and lowercased the training data. >>>> >>>> I then proceeded to the Language Model. The instructions seemed >>>> pretty much the same as for the Prepare Data section, so I moved the >>>> lowercased files from the corpus directory to the lm directory. Is >>>> this the right thing to do? >>>> >>>> I then trained the model and the system crashed with the following >>>> message:- >>>> >>>> Executing: bin/moses-scripts/scripts-20080125-1939/training/phrase- >>>> extract/extract >>>> ./model/aligned.0.en ./model/aligned.0.cy >>>> ./model/aligned.grow-diag-final-and ./model/extract.0-0 7 >>>> orientation PhraseExtract v1.3.0, written by Philipp Koehn phrase >>>> extraction from an aligned parallel corpus (also extracting >>>> orientation) >>>> Executing: cat ./model/extract.0-0.o.part* > ./model/extract.0-0.o >>>> cat: ./model/extract.0-0.o.part*: No such file or directory Exit >>>> code: 1 Died at >>>> bin/moses-scripts/scripts-20080125-1939/training/train- >>>> factored-phrase-model.perl >>>> line 899. >>>> >>>> So, my question is: am I giving Moses the wrong data to work with? >>>> >>>> In order to find out, I downloaded europarl from >>>> http://www.statmt.org/europarl/. It contained version 2 rather than >>>> version 3 but I thought nevertheless that I might try using it. I >>>> ran >>>> sentence-align-corpus.perl: >>>> >>>> ./sentence-align-corpus.perl en de >>>> >>>> , but it exited with the following message: >>>> >>>> Died at ./sentence-align-corpus.perl line 16. >>>> >>>> sentence-align-corpus.perl line 16 says: >>>> die unless -e "$dir/$l1"; >>>> >>>> Should I continue with europarl 2 or is it possible to download >>>> europarl 3 from somewhere? >>>> >>>> Alternatively would it be possible for you to explain the difference >>>> in purpose and format between wmt07/training/europarl-v3.fr-en.fr >>>> and wmt07/training/europarl-v3.en? Just to clarify: am I correct in >>>> saying that the Prepare Data section is about training the >>>> translation model i.e. word and phrase alignments, and Language >>>> model section is about creating a language model for the language >>>> we're translating to? >>>> Does the Prepare Data section start with two plain text parallel >>>> corpora with sentences on each line or is something more elaborate >>>> than that? Maybe the wmt07/training/europarl-v3.fr-en.fr is a plain >>>> text file with French sentence 1 followed by English sentence 1 >>>> followed by French sentence 2 followed by English sentence 2 etc? I >>>> could then adapt the Welsh-English corpus I'm using accordingly. >>>> >>>> Otherwise, is there a problem with the software/implementation on a >>>> Mac system? Would you recommend that I try the recently released >>>> version of Moses? Is there some way to install the new version of >>>> Moses without uninstalling the other one (I'm wondering about >>>> environment variables) >>>> >>>> Thank you, >>>> Llio Humphreys >>>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
