Re: [Moses-support] unknown words caused by Chinese segmentation differences between training and test data

Hieu Hoang Sat, 19 Jan 2013 16:52:09 -0800

is there a reason why the test and training data are not segmented thesame way?


On 18/01/2013 18:13, Yaqin wrote:

Dear all,


I'm using moses phrase-bases system to translate from Chinese to English.

I found a lot unknown words in the translation results of test data
are caused by the segmentation differences between the training data
and test data on the Chinese side.

For example "???" (globalization) is segmented as one word in the test
data, while it's segmented into two words "??" and "?" in the training
data. Thus, "???" is not recognized and failed to be translated.

Does anyone have any suggestion on this problem?

Thanks,
Yaqin



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] unknown words caused by Chinese segmentation differences between training and test data

Reply via email to