is there a reason why the test and training data are not segmented the same way?

On 18/01/2013 18:13, Yaqin wrote:
Dear all,

I'm using moses phrase-bases system to translate from Chinese to English.

I found a lot unknown words in the translation results of test data
are caused by the segmentation differences between the training data
and test data on the Chinese side.

For example "???" (globalization) is segmented as one word in the test
data, while it's segmented into two words "??" and "?" in the training
data. Thus, "???" is not recognized and failed to be translated.

Does anyone have any suggestion on this problem?

Thanks,
Yaqin



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to