Hi Hieu, for some reason, I don't have the model used to segment training set. But I re-segment the test data using the training set. The segmentation is now more consistent and reduces the unknown words.
Thanks Jie. I'll take a look. Yaqin On Sat, Jan 19, 2013 at 8:08 PM, Jie Jiang <[email protected]> wrote: > HI Yaqin: > > Source side word lattice might help in this case, please refer to the > related section in the following paper: > > Christopher Dyer, Smaranda Muresan, Philip Resnik, Generalizing Word Lattice > Translation. In Proceedings of ACL-08: HLT (June 2008), pp. 1012-1020 > > Best regards, > > Jie Jiang > Senior Language Technology Specialist > > Capita Translation and Interpreting > > Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ | Tel (UK): +44 > 845 367 7000 | Tel (US): +1 (800) 579-5010 > Tel Direct: +44 (0)844 854 8984 | [email protected] | Skype ID: > jie.jiang-capita-ti > www.capitatranslationinterpreting.com > > > 2013/1/18 Yaqin <[email protected]> >> >> Dear all, >> >> I'm using moses phrase-bases system to translate from Chinese to English. >> >> I found a lot unknown words in the translation results of test data >> are caused by the segmentation differences between the training data >> and test data on the Chinese side. >> >> For example "全球化" (globalization) is segmented as one word in the test >> data, while it's segmented into two words "全球" and "化" in the training >> data. Thus, "全球化" is not recognized and failed to be translated. >> >> Does anyone have any suggestion on this problem? >> >> Thanks, >> Yaqin >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
