Hi Barry and Moses-support,

Thanks for getting back to me.  In answer to Barry's questions,
* I'm using giza, not mgiza.
* I've put my training.out at math.princeton.edu/~macbeth/training.out
* I've listed the files produced by the script at the end of this email.

Of the two errors I asked about earlier, one remains:  In Step 3, a
"sentence mismatch error" on almost every sentence.  (Lines #41604-#158020.)

I noticed an earlier thread
http://www.mail-archive.com/[email protected]/msg02130.html
in which something similar was reported.  There, Felipe Sánchez
MartĂ­nezsuggested cleaning the corpus as a fix.  I had done that.  I'd
be very
grateful for suggestions of other things to investigate.  Or is this
something I shouldn't worry about?

(I had also asked about another ERROR:  in Step 2, the "Giza did not
produce the output file."  It didn't recur on re-running, and I think it
may be fixed -- I believe the problem last time might have been not
stripping the working directory of files from previous partial runs.)

Sincerely,
Heather Macbeth


** Files produced in working directory by Steps 1-3 **

.:
train
training [this is a script of mine]
training.out

./train:
corpus
giza.de-en
giza.en-de
model

./train/corpus:
de-en-int-train.snt
de.vcb
de.vcb.classes
de.vcb.classes.cats
en-de-int-train.snt
en.vcb
en.vcb.classes
en.vcb.classes.cats

./train/giza.de-en:
de-en.A3.final.gz
de-en.cooc
de-en.gizacfg

./train/giza.en-de:
en-de.A3.final.gz
en-de.cooc
en-de.gizacfg

./train/model:
aligned.grow-diag-final-and


On Wed, May 16, 2012 at 3:50 PM, Barry Haddow <[email protected]>wrote:

> Hi Heather
>
> Could you post the training.out file (or at least the step 2 part of it)?
>
> Are you using giza or mgiza?
>
> What files did giza produce?
>
> Cheers - Barry
>
> Sent from my ZX81
>
>
> ----- Reply message -----
> From: "Heather Macbeth" <[email protected]>
> Date: Wed, May 16, 2012 20:04
> Subject: [Moses-support] "Giza did not produce the output file" -- on
> cleaned corpus
> To: <[email protected]>
>
> Hi Moses-support,
>
> I'm looking for help on a problem that arose while building a baseline
> system.  Apart from changing FR to DE, I've tried to follow the
> instructions http://www.statmt.org/moses/?n=Moses.Baseline exactly.
>
> When I run the script train-model, the transcript training.out reports
>
> (line 2385) ERROR: Giza did not produce the output file
> train/giza.de-en/de-en.A3.final. Is your corpus clean (reasonably-sized
> sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl
> line 1077.
> (line 3285) ERROR: Giza did not produce the output file
> train/giza.en-de/en-de.A3.final. Is your corpus clean (reasonably-sized
> sentences)? at /home/heather/mosesdecoder/dist/training/train-model.perl
> line 1077.
>
> (I had indeed cleaned the corpus as instructed.  The output concluded with
> Input sentences: 158840  Output sentences:  158020
> So I take it this step went through ok.)
>
> It seems that people have had this problem before, for instance
> http://www.mail-archive.com/[email protected]/msg03434.html
>
> Barry Haddow's suggestion in that thread was to "have a look at the giza
> log file to see what went wrong. Maybe the merging of alignments failed."
> Does "giza log file" mean the Step 2 part of training.out?  If so, I've
> tried this, but I'm not exactly sure what I'm looking for.  There are a lot
> of WARNINGS, mainly of the form "already N iterations in hillclimb," but no
> other errors.
>
> Any suggestions for what symptoms to look for in the giza log file would be
> very welcome.
>
>
> In case it's relevant, let me mention another error that happens later
> (which I assume is a consequence of the first error):  during word
> alignments, a "sentence mismatch error" on almost every sentence.  Here's
> the relevant part of the transcript:  at the beginning of Step 3 (around
> line 5500):
>
> (3) generate word alignment @ Mon May 14 05:17:23 EDT 2012
> Combining forward and inverted alignment from files:
>  train/giza.de-en/de-en.A3.final.{bz2,gz}
>  train/giza.en-de/en-de.A3.final.{bz2,gz}
> Executing: mkdir -p train/model
> Executing: /home/heather/mosesdecoder/dist/training/symal/giza2bal.pl -d
> "gzip -cd train/giza.en-de/en-de.A3.final.gz" -i "gzip -cd
> train/giza.de-en/de-en.A3.final.gz"
> |/home/heather/mosesdecoder/dist/training/symal/symal -alignment="grow"
> -diagonal="yes" -final="yes" -both="yes" >
> train/model/aligned.grow-diag-final-and
> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1)
> Sentence mismatch error! Line #16665
> Sentence mismatch error! Line #16666
> Sentence mismatch error! Line #16667
> Sentence mismatch error! Line #16668
> Sentence mismatch error! Line #16669
> ....
> Sentence mismatch error! Line #158018
> Sentence mismatch error! Line #158019
> Sentence mismatch error! Line #158020
>
>
> Sincerely,
> Heather Macbeth
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to