Hi Tom,

As far as I know, the following are widely-used and open-source Chinese
tokenizers:

* https://github.com/fxsjy/jieba
* http://sourceforge.net/projects/zpar/
* https://github.com/NLPchina/ansj_seg

And this proprietary one:

* http://ictclas.nlpir.org/

(Disclaimer: I am one of the developers of jieba, and I personally use
this.)

--
Dingyuan Wang
2015年12月19日 00:51於 "Tom Hoar" <[email protected]>寫道:

> I'm looking for Chinese and Arabic tokenizers. We've been using
> Stanford's for a while but it has downfalls. The Chinese mode loads its
> statistical models very slowly. The Arabic mode stems the resulting
> tokens. The coup de grace is that their latest jar update (9 days ago)
> was compiled run only with Java 1.8.
>
> So, with the exception of Stanford, what choices are available for
> Chinese and Arabic that you're finding worthwhile?
>
> Thanks!
> Tom
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to