Hi,
I'm using the latest git version of moses, and it seems as if the training
pipeline got broken somehow as the format of aligned.grow-diag.final changed.
I'm invoking model-train.perl as follows:
/vol/customopt/machine-translation/src/mosesdecoder/scripts/training/train-model.perl
-external-bin-dir /vol/customopt/machine-translation/bin -root-dir . --corpus
train --f fr --e en --first-step 1 --last-step 9 -reordering
msd-bidirectional-fe --lm 0:3:/scratch/proycon/mosestest/train.fr.lm -mgiza
-mgiza-cpus 20 -cores 20 -sort-buffer-size 10G -sort-batch-size 253
-sort-compress gzip -sort-parallel 20
And it fails with warning like these on every sentence pair:
WARNING: Et is a bad alignment point in sentence 44968
T: If we do , I am sure we will be listened to .
S: Et lorsque nous serons capables de le faire , je suis sûr qu' ils nous
écouteront .
Looking into the code of 'extract', I see aligned.grow-diag-final is supposed
to consist of space seperated lines with %d-%d (the alignments). But my
aligned.grow-diag-final seems to be in a newer format and looks like this:
Je trouve que ce n' est pas acceptable . {##} I consider this to be
unacceptable . {##} 0-0 1-1 2-1 3-2 6-4 4-5 5-5 6-5 7-5 8-6
The 'extract' program only expects the latter part. So I manually stripped the
source and target sentences and left only that, and then it works. It seems
something is going wrong in the training pipeline?
Regards,
--
Maarten van Gompel
Centre for Language Studies
Radboud Universiteit Nijmegen
[email protected]
http://proycon.anaproy.nl
http://github.com/proycon
GnuPG key: 0x1A31555C XMPP: [email protected]
Bitcoin: 1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support