Hi,

I'm using the latest git version of moses, and it seems as if the training
pipeline got broken somehow as the format of aligned.grow-diag.final changed.

I'm invoking model-train.perl as follows:

/vol/customopt/machine-translation/src/mosesdecoder/scripts/training/train-model.perl
 -external-bin-dir /vol/customopt/machine-translation/bin  -root-dir . --corpus 
train --f fr --e en --first-step 1 --last-step 9 -reordering 
msd-bidirectional-fe --lm 0:3:/scratch/proycon/mosestest/train.fr.lm -mgiza 
-mgiza-cpus 20 -cores 20 -sort-buffer-size 10G -sort-batch-size 253 
-sort-compress gzip -sort-parallel 20

And it fails with warning like these on every sentence pair:

WARNING: Et is a bad alignment point in sentence 44968
T: If we do , I am sure we will be listened to .
S: Et lorsque nous serons capables de le faire , je suis sûr qu' ils nous 
écouteront .

Looking into the code of 'extract', I see aligned.grow-diag-final is supposed 
to consist of space seperated lines with %d-%d (the alignments). But my 
aligned.grow-diag-final seems to be in a newer format and looks like this:

Je trouve que ce n' est pas acceptable . {##} I consider this to be 
unacceptable . {##} 0-0 1-1 2-1 3-2 6-4 4-5 5-5 6-5 7-5 8-6

The 'extract' program only expects the latter part. So I manually stripped the 
source and target sentences and left only that, and then it works. It seems 
something is going wrong in the training pipeline?

Regards,

--

Maarten van Gompel
 Centre for Language Studies
 Radboud Universiteit Nijmegen

[email protected]
http://proycon.anaproy.nl
http://github.com/proycon

GnuPG key:  0x1A31555C  XMPP: [email protected]
Bitcoin:    1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to