Hi Moonloki,

You cannot in principle compare BLEU scores across different data samples. The 
score may vary wildly based on your training set quality and size and on how 
close the test set is related to the training data. Also—especially for EN–ZH 
translation—your results will depend on which tokeniser and segmented you used 
for ZH.

Still, here are some results from our experience.
We train on about 5.2M segments of parallel EN–ZH_HANS in-house data—from 
documentation TMs and software UI strings across our product range. We don’t 
use tuning, as  MT may be used for data from different domains, that is from 
different products. We use the KyTea segmenter with the lcmc-0.3.0-1.mod 
segmentation model for ZH. We use an in-house tokeniser based on a cascade of 
regular expressions.
With this setup, for data similar to our main product range, we get BLEU scores 
of about ,50 for EN–ZH_HANS translation. For data coming from niche products, 
the BLEU score goes down to about ,40.


Hope this gives you a perspective.


Cheers,

Ventzi

–––––––
Dr. Ventsislav Zhechev
Computational Linguist

Language Technologies
Localisation Services
Autodesk Development Sàrl
Neuchâtel, Switzerland

http://VentsislavZhechev.eu
tel: +41 32 723 9122
fax: +41 32 723 9399


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to