Re: [Moses-support] Train Moses Engine for EN to ZH_CN

Francois Masselot Tue, 31 Aug 2010 06:26:55 -0700

Dear Wenlong,

The Moses toolkit is language independent, so there shouldn't be anything 
special to do. The one thing to take care of is to tokenize properly the 
Chinese training corpus. Moses takes as input sentences where words (tokens) 
are space-separated, and usually in Chinese texts, words are not separated by 
spaces. There's nothing else special: I created recently an English-Chinese and 
Chinese-English Moses engines and training and decoding work just fine. 
For decoding, you just need to tokenize and detokenize accordingly, i.e. 
tokenize Chinese source sentences, and remove spaces between Chinese words when 
Chinese is the target language.


Regards
François


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Train Moses Engine for EN to ZH_CN

Reply via email to