I'm familiar with two methods to segment Chinese. One method simply inserts a space between each character. The results are predictable, but translations are generally not as high quality as possible.
The second method uses a program that identifies words as sequences of multiple characters (typically 1, 2 or 3) and inserts a space between them. I haven't worked with Chinese for a while, so I'm not sure of the latest advancements in Chinese word segmentation. LDC publishes a perl script, http://projects.ldc.upenn.edu/Chinese/ [1], http://www.ldc.upenn.edu/Projects/Chinese/ldc-cn-seg.1.2.tgz. I remember seeing a C++ version, but can't find it now. There's also this one on Google code: http://code.google.com/p/zhseg/ [2] Maybe someone on moses-support knows of other Chinese tools. Regards, Tom On Thu, 18 Aug 2011 09:16:48 +0800, 蒋乾 wrote: Hi, Thank your for your suggestions. I have done some test. It showed both English to Chinese and Chinese to English training would failed if I did not do any measures. Suzy and Tom gave me a useful advice that do something like segment. The further question is, how to do segment? Could anybody who has the experience of training corpus either from English to Chinese or from Chinese to English give me some idea? Thank you very much. Regards, James 2011/8/17 Tom Hoar I agree with Suzy. Also, if your translation requests are not segmented, it's possible that the training corpus was also not segmented. Verify that your training corpus, develop and test sets were all segmented when you trained/tuned your translation model. If not, you'll need to start from the beginning. Tom On Wed, 17 Aug 2011 19:28:17 +1000, Suzy Howlett wrote: > Hi James, > > It looks like the text has not been segmented into words, so it > thinks > every sentence is a single word. Unless the sentences you are trying > to > translate are identical to some sentences in the training corpus, it > will think every test sentence is an unknown word it's never seen > before. You'll need to use some kind of word segmentation. > Unfortunately > I don't know anything about that area, so I have no useful > suggestions. > > Best, > Suzy > > On 17/08/11 7:13 PM, 蒋乾 wrote: >> *Hi all, >> * >> *When I used MT to do translation from Chines to English, I meet an >> unexpected problem.Could you please tell * >> *me the reason if you have any idea about it?* >> ** >> *I trained a big amount of paralleled corpus about 2,600,000 lines >> on a >> computer with 5GB RAM.* >> *After that, I tried translating a small Chinese file about 80 lines >> into English.Unexpectedly, it didn't work.* >> *It did not do any translation work at all. The target file I got >> was as >> same as the source file.* >> ** >> *One sample line of the information shown on the screen during MT's >> traslation is as follows,* >> >> " >> Translating: 使用文本索引查询视图 >> Collecting options took 0.000 seconds >> Search took 0.000 seconds >> BEST TRANSLATION: 使用文本索引查询视图|UNK|UNK|UNK [1] >> [total=-99.978] 0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000, 0.000, 0.000>> >> Translation took 0.000 seconds >> Finished translating >> Translating: 使用文本索引查询视图关于 >> Collecting options took 0.000 seconds >> Search took 0.000 seconds >> BEST TRANSLATION: 使用文本索引查询视图关于|UNK|UNK|UNK [1] >> [total=-99.978] 0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000, 0.000, 0.000>> >> Translation took 0.000 seconds >> Finished translating >> " >> >> *It is very appreciated if you could tell me the reason why it >> happens >> and the way how to solve it.* >> ** >> *Thank you very much.* >> ** >> *Regards,* >> *James* >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] [5] >> http://mailman.mit.edu/mailman/listinfo/moses-support [6] _______________________________________________ Moses-support mailing list [email protected] [7] http://mailman.mit.edu/mailman/listinfo/moses-support [8] Links: ------ [1] http://projects.ldc.upenn.edu/Chinese/ [2] http://code.google.com/p/zhseg/ [3] mailto:[email protected] [4] mailto:[email protected] [5] mailto:[email protected] [6] http://mailman.mit.edu/mailman/listinfo/moses-support [7] mailto:[email protected] [8] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
