Hi Barry and Moses-support, Thanks for getting back to me. In answer to Barry's questions, * I'm using giza, not mgiza. * I've put my training.out at math.princeton.edu/~macbeth/training.out * I've listed the files produced by the script at the end of this email.
Of the two errors I asked about earlier, one remains: In Step 3, a "sentence mismatch error" on almost every sentence. (Lines #41604-#158020.) I noticed an earlier thread http://www.mail-archive.com/[email protected]/msg02130.html in which something similar was reported. There, Felipe Sánchez MartĂnezsuggested cleaning the corpus as a fix. I had done that. I'd be very grateful for suggestions of other things to investigate. Or is this something I shouldn't worry about? (I had also asked about another ERROR: in Step 2, the "Giza did not produce the output file." It didn't recur on re-running, and I think it may be fixed -- I believe the problem last time might have been not stripping the working directory of files from previous partial runs.) Sincerely, Heather Macbeth ** Files produced in working directory by Steps 1-3 ** .: train training [this is a script of mine] training.out ./train: corpus giza.de-en giza.en-de model ./train/corpus: de-en-int-train.snt de.vcb de.vcb.classes de.vcb.classes.cats en-de-int-train.snt en.vcb en.vcb.classes en.vcb.classes.cats ./train/giza.de-en: de-en.A3.final.gz de-en.cooc de-en.gizacfg ./train/giza.en-de: en-de.A3.final.gz en-de.cooc en-de.gizacfg ./train/model: aligned.grow-diag-final-and On Wed, May 16, 2012 at 3:50 PM, Barry Haddow <[email protected]>wrote: > Hi Heather > > Could you post the training.out file (or at least the step 2 part of it)? > > Are you using giza or mgiza? > > What files did giza produce? > > Cheers - Barry > > Sent from my ZX81 > > > ----- Reply message ----- > From: "Heather Macbeth" <[email protected]> > Date: Wed, May 16, 2012 20:04 > Subject: [Moses-support] "Giza did not produce the output file" -- on > cleaned corpus > To: <[email protected]> > > Hi Moses-support, > > I'm looking for help on a problem that arose while building a baseline > system. Apart from changing FR to DE, I've tried to follow the > instructions http://www.statmt.org/moses/?n=Moses.Baseline exactly. > > When I run the script train-model, the transcript training.out reports > > (line 2385) ERROR: Giza did not produce the output file > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl > line 1077. > (line 3285) ERROR: Giza did not produce the output file > train/giza.en-de/en-de.A3.final. Is your corpus clean (reasonably-sized > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl > line 1077. > > (I had indeed cleaned the corpus as instructed. The output concluded with > Input sentences: 158840 Output sentences: 158020 > So I take it this step went through ok.) > > It seems that people have had this problem before, for instance > http://www.mail-archive.com/[email protected]/msg03434.html > > Barry Haddow's suggestion in that thread was to "have a look at the giza > log file to see what went wrong. Maybe the merging of alignments failed." > Does "giza log file" mean the Step 2 part of training.out? If so, I've > tried this, but I'm not exactly sure what I'm looking for. There are a lot > of WARNINGS, mainly of the form "already N iterations in hillclimb," but no > other errors. > > Any suggestions for what symptoms to look for in the giza log file would be > very welcome. > > > In case it's relevant, let me mention another error that happens later > (which I assume is a consequence of the first error): during word > alignments, a "sentence mismatch error" on almost every sentence. Here's > the relevant part of the transcript: at the beginning of Step 3 (around > line 5500): > > (3) generate word alignment @ Mon May 14 05:17:23 EDT 2012 > Combining forward and inverted alignment from files: > train/giza.de-en/de-en.A3.final.{bz2,gz} > train/giza.en-de/en-de.A3.final.{bz2,gz} > Executing: mkdir -p train/model > Executing: /home/heather/mosesdecoder/dist/training/symal/giza2bal.pl -d > "gzip -cd train/giza.en-de/en-de.A3.final.gz" -i "gzip -cd > train/giza.de-en/de-en.A3.final.gz" > |/home/heather/mosesdecoder/dist/training/symal/symal -alignment="grow" > -diagonal="yes" -final="yes" -both="yes" > > train/model/aligned.grow-diag-final-and > symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) > Sentence mismatch error! Line #16665 > Sentence mismatch error! Line #16666 > Sentence mismatch error! Line #16667 > Sentence mismatch error! Line #16668 > Sentence mismatch error! Line #16669 > .... > Sentence mismatch error! Line #158018 > Sentence mismatch error! Line #158019 > Sentence mismatch error! Line #158020 > > > Sincerely, > Heather Macbeth > > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
