[Moses-support] correct corpus

Korzec, Sanne Mon, 15 Mar 2010 05:33:08 -0700

Hi all,
 
I am still trying to figure out why my BLEU baseline score is different from 
the literature. So I am backtracking everything. Maybe you could help me out 
with the following:
 
I would like to use the same europarl corpus for training as the wmt08. 
I download this from the following paths, could you please tell me if this is 
correct.
 
preprocessing: 
http://www.statmt.org/wmt08/training-parallel.tar
 
for a FR->EN system I use: 
europarl-v3b.fr-en.en.gz 
europarl-v3b.fr-en.fr.gz 
 
In the wmt08 baseline system at prepare data these unpacked files are called 
wmt08/training/europarl-v3.fr-en.fr 
wmt08/training/europarl-v3.fr-en.en
 
No b here, are these the same/correct files?
 
Same issue with the language model data which i download from:
 
http://www.statmt.org/wmt08/training-monolingual.tar



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] correct corpus

Reply via email to