Hi,
I'm trying to get BLEU scores for several languages using the Europarl corpus.
I am using the instructions from http://www.statmt.org/wmt07/baseline.html as
well as from http://www.guardiani.us/index.php/Moses_Language_Model_Howto_v2.
When I translate from english to french for example, after tokenization, when I
try to filter out long sentences using clean-corpus-n.pl, it dies after a while
saying "europarl.tok.fr is too short!"
[acp08...@node95 europarl]$
../darwin/darwin/bin/moses-scripts/scripts-20090221-2008/training/clean-corpus-n.perl
aligned/corpus3/europarl.tok en fr aligned/corpus3/europarl.clean 1 40
clean-corpus.perl: processing aligned/corpus3/europarl.tok.en & .fr to
aligned/corpus3/europarl.clean, cutoff 1-40
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)........aligned/corpus3/europarl.tok.fr
is too short! at
../darwin/darwin/bin/moses-scripts/scripts-20090221-2008/training/clean-corpus-n.perl
line 76, <E> line 1085695.
[acp08...@node95 europarl]$
Could someone please tell me if there is something obvious that I'm missing?
Regards,
Aditya
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support