there was a recent checkin for Korean

https://github.com/moses-smt/mosesdecoder/commit/194964c017d8acb56918bab94f4d7cdd60b9c9b7

Maybe there are also some Korean or Asian-specific tools out there


On 17/01/18 01:01, daideqi wrote:
Dear Moses-Support,

Some colleagues and I, who are all new to SMT and Moses, have some Korean parallel corpora that we want to use to train Moses.

My question is how do we go about preparing/tokenizing the data, and can you recommend any specific tools?  I searched the moses-support archive and the Interwebs and couldn't find any specific recommendations or step-by-step instructions for newbs like us.

We'd be very grateful if you could point us in the right direction.

Thanks in advance!


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
http://moses-smt.org/

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to