Hi Heather

It all looks quite normal until the sentence mismatch errors start. Although I 
didn't see this error:

> > (line 2385) ERROR: Giza did not produce the output file
> > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized
> > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl
> > line 1077.

in your log file. From the log file and directory listings you gave me, it 
seemed that giza *did* produce its output (ie train/giza.de-en/de-en.A3.final 
and the equivalent in the other direction). The sentence mismatch errors 
indicates that the forward and backward giza outputs are not compatible.

I wonder if you had some files left over from previous run?

Could you try running from train-model.perl the start on a completely clean 
directory, and if it fails, post the output? If the first failure is a sentence 
mismatch error, maybe you could post the de-en.A3.final.gz and 
en-de.A3.final.gz 
files,

cheers - Barry


On Thursday 17 May 2012 04:06:21 Heather Macbeth wrote:
> Hi Barry and Moses-support,
> 
> Thanks for getting back to me.  In answer to Barry's questions,
> * I'm using giza, not mgiza.
> * I've put my training.out at math.princeton.edu/~macbeth/training.out
> * I've listed the files produced by the script at the end of this email.
> 
> Of the two errors I asked about earlier, one remains:  In Step 3, a
> "sentence mismatch error" on almost every sentence.  (Lines
>  #41604-#158020.)
> 
> I noticed an earlier thread
> http://www.mail-archive.com/[email protected]/msg02130.html
> in which something similar was reported.  There, Felipe Sánchez
> MartĂ­nezsuggested cleaning the corpus as a fix.  I had done that.  I'd
> be very
> grateful for suggestions of other things to investigate.  Or is this
> something I shouldn't worry about?
> 
> (I had also asked about another ERROR:  in Step 2, the "Giza did not
> produce the output file."  It didn't recur on re-running, and I think it
> may be fixed -- I believe the problem last time might have been not
> stripping the working directory of files from previous partial runs.)
> 
> Sincerely,
> Heather Macbeth
> 
> 
> ** Files produced in working directory by Steps 1-3 **
> 
> .:
> train
> training [this is a script of mine]
> training.out
> 
> ./train:
> corpus
> giza.de-en
> giza.en-de
> model
> 
> ./train/corpus:
> de-en-int-train.snt
> de.vcb
> de.vcb.classes
> de.vcb.classes.cats
> en-de-int-train.snt
> en.vcb
> en.vcb.classes
> en.vcb.classes.cats
> 
> ./train/giza.de-en:
> de-en.A3.final.gz
> de-en.cooc
> de-en.gizacfg
> 
> ./train/giza.en-de:
> en-de.A3.final.gz
> en-de.cooc
> en-de.gizacfg
> 
> ./train/model:
> aligned.grow-diag-final-and
> 
> On Wed, May 16, 2012 at 3:50 PM, Barry Haddow 
<[email protected]>wrote:
> > Hi Heather
> >
> > Could you post the training.out file (or at least the step 2 part of it)?
> >
> > Are you using giza or mgiza?
> >
> > What files did giza produce?
> >
> > Cheers - Barry
> >
> > Sent from my ZX81
> >
> >
> > ----- Reply message -----
> > From: "Heather Macbeth" <[email protected]>
> > Date: Wed, May 16, 2012 20:04
> > Subject: [Moses-support] "Giza did not produce the output file" -- on
> > cleaned corpus
> > To: <[email protected]>
> >
> > Hi Moses-support,
> >
> > I'm looking for help on a problem that arose while building a baseline
> > system.  Apart from changing FR to DE, I've tried to follow the
> > instructions http://www.statmt.org/moses/?n=Moses.Baseline exactly.
> >
> > When I run the script train-model, the transcript training.out reports
> >
> > (line 2385) ERROR: Giza did not produce the output file
> > train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized
> > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl
> > line 1077.
> > (line 3285) ERROR: Giza did not produce the output file
> > train/giza.en-de/en-de.A3.final. Is your corpus clean (reasonably-sized
> > sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl
> > line 1077.
> >
> > (I had indeed cleaned the corpus as instructed.  The output concluded
> > with Input sentences: 158840  Output sentences:  158020
> > So I take it this step went through ok.)
> >
> > It seems that people have had this problem before, for instance
> > http://www.mail-archive.com/[email protected]/msg03434.html
> >
> > Barry Haddow's suggestion in that thread was to "have a look at the giza
> > log file to see what went wrong. Maybe the merging of alignments failed."
> > Does "giza log file" mean the Step 2 part of training.out?  If so, I've
> > tried this, but I'm not exactly sure what I'm looking for.  There are a
> > lot of WARNINGS, mainly of the form "already N iterations in hillclimb,"
> > but no other errors.
> >
> > Any suggestions for what symptoms to look for in the giza log file would
> > be very welcome.
> >
> >
> > In case it's relevant, let me mention another error that happens later
> > (which I assume is a consequence of the first error):  during word
> > alignments, a "sentence mismatch error" on almost every sentence.  Here's
> > the relevant part of the transcript:  at the beginning of Step 3 (around
> > line 5500):
> >
> > (3) generate word alignment @ Mon May 14 05:17:23 EDT 2012
> > Combining forward and inverted alignment from files:
> >  train/giza.de-en/de-en.A3.final.{bz2,gz}
> >  train/giza.en-de/en-de.A3.final.{bz2,gz}
> > Executing: mkdir -p train/model
> > Executing: /home/heather/mosesdecoder/dist/training/symal/giza2bal.pl -d
> > "gzip -cd train/giza.en-de/en-de.A3.final.gz" -i "gzip -cd
> > train/giza.de-en/de-en.A3.final.gz"
> >
> > |/home/heather/mosesdecoder/dist/training/symal/symal -alignment="grow"
> >
> > -diagonal="yes" -final="yes" -both="yes" >
> > train/model/aligned.grow-diag-final-and
> > symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1)
> > Sentence mismatch error! Line #16665
> > Sentence mismatch error! Line #16666
> > Sentence mismatch error! Line #16667
> > Sentence mismatch error! Line #16668
> > Sentence mismatch error! Line #16669
> > ....
> > Sentence mismatch error! Line #158018
> > Sentence mismatch error! Line #158019
> > Sentence mismatch error! Line #158020
> >
> >
> > Sincerely,
> > Heather Macbeth
> >
> >
> >
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> 
 
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to