Hi Heather It all looks quite normal until the sentence mismatch errors start. Although I didn't see this error:
> > (line 2385) ERROR: Giza did not produce the output file > > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized > > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl > > line 1077. in your log file. From the log file and directory listings you gave me, it seemed that giza *did* produce its output (ie train/giza.de-en/de-en.A3.final and the equivalent in the other direction). The sentence mismatch errors indicates that the forward and backward giza outputs are not compatible. I wonder if you had some files left over from previous run? Could you try running from train-model.perl the start on a completely clean directory, and if it fails, post the output? If the first failure is a sentence mismatch error, maybe you could post the de-en.A3.final.gz and en-de.A3.final.gz files, cheers - Barry On Thursday 17 May 2012 04:06:21 Heather Macbeth wrote: > Hi Barry and Moses-support, > > Thanks for getting back to me. In answer to Barry's questions, > * I'm using giza, not mgiza. > * I've put my training.out at math.princeton.edu/~macbeth/training.out > * I've listed the files produced by the script at the end of this email. > > Of the two errors I asked about earlier, one remains: In Step 3, a > "sentence mismatch error" on almost every sentence. (Lines > #41604-#158020.) > > I noticed an earlier thread > http://www.mail-archive.com/[email protected]/msg02130.html > in which something similar was reported. There, Felipe Sánchez > MartĂnezsuggested cleaning the corpus as a fix. I had done that. I'd > be very > grateful for suggestions of other things to investigate. Or is this > something I shouldn't worry about? > > (I had also asked about another ERROR: in Step 2, the "Giza did not > produce the output file." It didn't recur on re-running, and I think it > may be fixed -- I believe the problem last time might have been not > stripping the working directory of files from previous partial runs.) > > Sincerely, > Heather Macbeth > > > ** Files produced in working directory by Steps 1-3 ** > > .: > train > training [this is a script of mine] > training.out > > ./train: > corpus > giza.de-en > giza.en-de > model > > ./train/corpus: > de-en-int-train.snt > de.vcb > de.vcb.classes > de.vcb.classes.cats > en-de-int-train.snt > en.vcb > en.vcb.classes > en.vcb.classes.cats > > ./train/giza.de-en: > de-en.A3.final.gz > de-en.cooc > de-en.gizacfg > > ./train/giza.en-de: > en-de.A3.final.gz > en-de.cooc > en-de.gizacfg > > ./train/model: > aligned.grow-diag-final-and > > On Wed, May 16, 2012 at 3:50 PM, Barry Haddow <[email protected]>wrote: > > Hi Heather > > > > Could you post the training.out file (or at least the step 2 part of it)? > > > > Are you using giza or mgiza? > > > > What files did giza produce? > > > > Cheers - Barry > > > > Sent from my ZX81 > > > > > > ----- Reply message ----- > > From: "Heather Macbeth" <[email protected]> > > Date: Wed, May 16, 2012 20:04 > > Subject: [Moses-support] "Giza did not produce the output file" -- on > > cleaned corpus > > To: <[email protected]> > > > > Hi Moses-support, > > > > I'm looking for help on a problem that arose while building a baseline > > system. Apart from changing FR to DE, I've tried to follow the > > instructions http://www.statmt.org/moses/?n=Moses.Baseline exactly. > > > > When I run the script train-model, the transcript training.out reports > > > > (line 2385) ERROR: Giza did not produce the output file > > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized > > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl > > line 1077. > > (line 3285) ERROR: Giza did not produce the output file > > train/giza.en-de/en-de.A3.final. Is your corpus clean (reasonably-sized > > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl > > line 1077. > > > > (I had indeed cleaned the corpus as instructed. The output concluded > > with Input sentences: 158840 Output sentences: 158020 > > So I take it this step went through ok.) > > > > It seems that people have had this problem before, for instance > > http://www.mail-archive.com/[email protected]/msg03434.html > > > > Barry Haddow's suggestion in that thread was to "have a look at the giza > > log file to see what went wrong. Maybe the merging of alignments failed." > > Does "giza log file" mean the Step 2 part of training.out? If so, I've > > tried this, but I'm not exactly sure what I'm looking for. There are a > > lot of WARNINGS, mainly of the form "already N iterations in hillclimb," > > but no other errors. > > > > Any suggestions for what symptoms to look for in the giza log file would > > be very welcome. > > > > > > In case it's relevant, let me mention another error that happens later > > (which I assume is a consequence of the first error): during word > > alignments, a "sentence mismatch error" on almost every sentence. Here's > > the relevant part of the transcript: at the beginning of Step 3 (around > > line 5500): > > > > (3) generate word alignment @ Mon May 14 05:17:23 EDT 2012 > > Combining forward and inverted alignment from files: > > train/giza.de-en/de-en.A3.final.{bz2,gz} > > train/giza.en-de/en-de.A3.final.{bz2,gz} > > Executing: mkdir -p train/model > > Executing: /home/heather/mosesdecoder/dist/training/symal/giza2bal.pl -d > > "gzip -cd train/giza.en-de/en-de.A3.final.gz" -i "gzip -cd > > train/giza.de-en/de-en.A3.final.gz" > > > > |/home/heather/mosesdecoder/dist/training/symal/symal -alignment="grow" > > > > -diagonal="yes" -final="yes" -both="yes" > > > train/model/aligned.grow-diag-final-and > > symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) > > Sentence mismatch error! Line #16665 > > Sentence mismatch error! Line #16666 > > Sentence mismatch error! Line #16667 > > Sentence mismatch error! Line #16668 > > Sentence mismatch error! Line #16669 > > .... > > Sentence mismatch error! Line #158018 > > Sentence mismatch error! Line #158019 > > Sentence mismatch error! Line #158020 > > > > > > Sincerely, > > Heather Macbeth > > > > > > > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > -- Barry Haddow University of Edinburgh +44 (0) 131 651 3173 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
