Hi Barry and Moses-support, Thanks again for your detailed suggestions. I believe I've fixed the errors I asked about.
With the "sentence mismatch error," the problem was that at that line the news-commentary-v7.de-en parallel corpora apparently go out of sync. Is this known? Or deliberate? 41812:Es war Deutschlands glücklichste Nacht. 41813:Betrachtet man sich deren Folgen zwanzig Jahre später, so liegen revolutionäre Veränderungen hinter uns: 41814: 41815: 41816:Die Sowjetunion und ihr Imperium sind sang- und klanglos verschwunden und mit ihr die ganze Weltordnung des Kalten Krieges. 41817:Deutschland wurde wiedervereinigt, Osteuropa und die Staaten der sowjetischen Peripherie gewannen ihre Unabhängigkeit, das Apartheid-Regime in Südafrika löste sich auf, zahlreiche Bürgerkriege in Asien, Afrika und Lateinamerika fanden ein Ende, im Nahen Osten kamen Israelis und Palästinenser einem Frieden so nah, wie seitdem nicht mehr, und das auseinander brechende Jugoslawien versank in Krieg und ethnischer Säuberung. In Afghanistan ging der Krieg unter neuen Vorzeichen weiter, und das sollte Konsequenzen haben. 41818:Die USA waren der siegreiche Erbe der zusammengebrochenen Ordnung des Kalten Krieges und standen allein und unangefochten auf dem Gipfel ihrer globalen Macht. 41812:It was Germany's happiest hour. 41813:Twenty years later, many revolutionary consequences of that night lie behind us. The Soviet Union and its empire quietly disappeared, and with them the Cold War international order. 41814:Germany was reunited; Eastern Europe and the states on the Soviet periphery won their independence; South Africa's apartheid regime fell apart, numerous civil wars in Asia, Africa, and Latin America ended; Israelis and Palestinians came closer to peace than at any time since; and a disintegrating Yugoslavia degenerated into war and ethnic cleansing. 41815:In Afghanistan, war continued under different circumstances, with serious ramifications for the region and, indeed, the world. 41816:As the victorious heir to the collapsed Cold War order, the United States stood alone, undisputed, at the peak of its global power. (The "Giza did not produce the output file" ERROR I had previously fixed, quite along the lines Barry Haddow suggested -- see my post of Thu 17 May, 3 am GMT.) Sincerely, Heather Macbeth On Thu, May 17, 2012 at 3:57 AM, Barry Haddow <[email protected]>wrote: > Hi Heather > > It all looks quite normal until the sentence mismatch errors start. > Although I > didn't see this error: > > > > (line 2385) ERROR: Giza did not produce the output file > > > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized > > > sentences)? at > /home/heather/mosesdecoder/dist/training/train-model.perl > > > line 1077. > > in your log file. From the log file and directory listings you gave me, it > seemed that giza *did* produce its output (ie > train/giza.de-en/de-en.A3.final > and the equivalent in the other direction). The sentence mismatch errors > indicates that the forward and backward giza outputs are not compatible. > > I wonder if you had some files left over from previous run? > > Could you try running from train-model.perl the start on a completely clean > directory, and if it fails, post the output? If the first failure is a > sentence > mismatch error, maybe you could post the de-en.A3.final.gz and > en-de.A3.final.gz > files, > > cheers - Barry > > > On Thursday 17 May 2012 04:06:21 Heather Macbeth wrote: > > Hi Barry and Moses-support, > > > > Thanks for getting back to me. In answer to Barry's questions, > > * I'm using giza, not mgiza. > > * I've put my training.out at math.princeton.edu/~macbeth/training.out > > * I've listed the files produced by the script at the end of this email. > > > > Of the two errors I asked about earlier, one remains: In Step 3, a > > "sentence mismatch error" on almost every sentence. (Lines > > #41604-#158020.) > > > > I noticed an earlier thread > > http://www.mail-archive.com/[email protected]/msg02130.html > > in which something similar was reported. There, Felipe Sánchez > > Martínezsuggested cleaning the corpus as a fix. I had done that. I'd > > be very > > grateful for suggestions of other things to investigate. Or is this > > something I shouldn't worry about? > > > > (I had also asked about another ERROR: in Step 2, the "Giza did not > > produce the output file." It didn't recur on re-running, and I think it > > may be fixed -- I believe the problem last time might have been not > > stripping the working directory of files from previous partial runs.) > > > > Sincerely, > > Heather Macbeth > > > > > > ** Files produced in working directory by Steps 1-3 ** > > > > .: > > train > > training [this is a script of mine] > > training.out > > > > ./train: > > corpus > > giza.de-en > > giza.en-de > > model > > > > ./train/corpus: > > de-en-int-train.snt > > de.vcb > > de.vcb.classes > > de.vcb.classes.cats > > en-de-int-train.snt > > en.vcb > > en.vcb.classes > > en.vcb.classes.cats > > > > ./train/giza.de-en: > > de-en.A3.final.gz > > de-en.cooc > > de-en.gizacfg > > > > ./train/giza.en-de: > > en-de.A3.final.gz > > en-de.cooc > > en-de.gizacfg > > > > ./train/model: > > aligned.grow-diag-final-and > > > > On Wed, May 16, 2012 at 3:50 PM, Barry Haddow > <[email protected]>wrote: > > > Hi Heather > > > > > > Could you post the training.out file (or at least the step 2 part of > it)? > > > > > > Are you using giza or mgiza? > > > > > > What files did giza produce? > > > > > > Cheers - Barry > > > > > > Sent from my ZX81 > > > > > > > > > ----- Reply message ----- > > > From: "Heather Macbeth" <[email protected]> > > > Date: Wed, May 16, 2012 20:04 > > > Subject: [Moses-support] "Giza did not produce the output file" -- on > > > cleaned corpus > > > To: <[email protected]> > > > > > > Hi Moses-support, > > > > > > I'm looking for help on a problem that arose while building a baseline > > > system. Apart from changing FR to DE, I've tried to follow the > > > instructions http://www.statmt.org/moses/?n=Moses.Baseline exactly. > > > > > > When I run the script train-model, the transcript training.out reports > > > > > > (line 2385) ERROR: Giza did not produce the output file > > > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized > > > sentences)? at > /home/heather/mosesdecoder/dist/training/train-model.perl > > > line 1077. > > > (line 3285) ERROR: Giza did not produce the output file > > > train/giza.en-de/en-de.A3.final. Is your corpus clean (reasonably-sized > > > sentences)? at > /home/heather/mosesdecoder/dist/training/train-model.perl > > > line 1077. > > > > > > (I had indeed cleaned the corpus as instructed. The output concluded > > > with Input sentences: 158840 Output sentences: 158020 > > > So I take it this step went through ok.) > > > > > > It seems that people have had this problem before, for instance > > > http://www.mail-archive.com/[email protected]/msg03434.html > > > > > > Barry Haddow's suggestion in that thread was to "have a look at the > giza > > > log file to see what went wrong. Maybe the merging of alignments > failed." > > > Does "giza log file" mean the Step 2 part of training.out? If so, I've > > > tried this, but I'm not exactly sure what I'm looking for. There are a > > > lot of WARNINGS, mainly of the form "already N iterations in > hillclimb," > > > but no other errors. > > > > > > Any suggestions for what symptoms to look for in the giza log file > would > > > be very welcome. > > > > > > > > > In case it's relevant, let me mention another error that happens later > > > (which I assume is a consequence of the first error): during word > > > alignments, a "sentence mismatch error" on almost every sentence. > Here's > > > the relevant part of the transcript: at the beginning of Step 3 > (around > > > line 5500): > > > > > > (3) generate word alignment @ Mon May 14 05:17:23 EDT 2012 > > > Combining forward and inverted alignment from files: > > > train/giza.de-en/de-en.A3.final.{bz2,gz} > > > train/giza.en-de/en-de.A3.final.{bz2,gz} > > > Executing: mkdir -p train/model > > > Executing: /home/heather/mosesdecoder/dist/training/symal/giza2bal.pl-d > > > "gzip -cd train/giza.en-de/en-de.A3.final.gz" -i "gzip -cd > > > train/giza.de-en/de-en.A3.final.gz" > > > > > > |/home/heather/mosesdecoder/dist/training/symal/symal -alignment="grow" > > > > > > -diagonal="yes" -final="yes" -both="yes" > > > > train/model/aligned.grow-diag-final-and > > > symal: computing grow alignment: diagonal (1) final (1)both-uncovered > (1) > > > Sentence mismatch error! Line #16665 > > > Sentence mismatch error! Line #16666 > > > Sentence mismatch error! Line #16667 > > > Sentence mismatch error! Line #16668 > > > Sentence mismatch error! Line #16669 > > > .... > > > Sentence mismatch error! Line #158018 > > > Sentence mismatch error! Line #158019 > > > Sentence mismatch error! Line #158020 > > > > > > > > > Sincerely, > > > Heather Macbeth > > > > > > > > > > > > The University of Edinburgh is a charitable body, registered in > > > Scotland, with registration number SC005336. > > > > -- > Barry Haddow > University of Edinburgh > +44 (0) 131 651 3173 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
