Hi Marcin, At Autodesk we’ve been successfully using KyTea since 2011. The main reason we chose this specific tool is that it has readily available models for both Chinese and Japanese, which simplified the integration in our workflows. At least for Japanese, we also evaluated Mecab in 2011, but found KyTea to serve us better.
Keep in mind, though, that we are not very interested in the quality of the segmentation per se; instead we need the MT to be of sufficient quality, regardless if what the segmentation tool does makes sense on its own or not. Cheers, Ventzi ––––––– Dr. Ventsislav Zhechev Computational Linguist, Certified ScrumMaster® Platform Architecture and Technologies Localisation Services MAIN +41 32 723 91 22 FAX +41 32 723 93 99 http://VentsislavZhechev.eu Autodesk, Inc. Rue de Puits-Godet 6 2000 Neuchâtel, Switzerland www.autodesk.com > 20.03.2015 г., в 14:32, [email protected] написал(а): > > Date: Fri, 20 Mar 2015 13:19:02 +0100 > From: Marcin Junczys-Dowmunt <[email protected]> > Subject: [Moses-support] Chinese segmentation/tokenization > To: Moses Support <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > > > Hi, > > questions appear from time to time on the list concerning Chinese > segmentation/tokenization. I saw Barry mention Lingpipe and other tools. > Is there a favourite tool you guys prefer to use over others? > > Thanks, > > Marcin
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
