How are you doing? You are right, unless you perform some kind of segmentation on non-spacing languages like Chinese, Japanese or Korean, there is no way you can apply BLEU/NIST scores as is. The BLEU/NIST scores obtained would be biaised by the errors of the segmenter used. On real texts, segmenters have an error rate of 5-10%. But nobody knows what the error rates would be on MT texts. Moreover, even if your system outputs segmented data, you will not be able to compare with commercials systems like Systran, that do not segment their outputs at all.
Our answer was to apply the BLEU/NIST formulae in character-units instead of words. Of course, for this to have any meaning, you have to show a good correlation between the method in characters and in words. We showed that for English, and were able to determine that BLEU with 18 characters behaves like the "usual" BLEU in 4 words. Also the measure equivalent to the usual English 4-word BLEU would be 9 characters for Japanese. Hope these results are published at some forthcoming conference! ;)
Best regards,
Etienne Denoual & Yves Lepage
ATR, Kyoto
Somers, Harold wrote:
Does anyone have any experience of applying BLEU/NIST scores to non-spacing languages like Japanese, Chinese, Korean? Since they are word-based metrics, I presume output has to be segmented somehow.
_______________________________________________ Mt-list mailing list
_______________________________________________ Mt-list mailing list
