I do, in Arraymancer: 
[https://github.com/mratsim/Arraymancer](https://github.com/mratsim/Arraymancer)/

For NLP there was some wrappers here: 
[https://github.com/Nim-NLP](https://github.com/Nim-NLP) with a focus on NLP on 
the Chinese language.

For FSM for NLP, I've came across 
[BlingFire](https://github.com/microsoft/BlingFire) from Microsoft research but 
I guess the most flexible tokenizer is 
[sentencepiece](https://github.com/google/sentencepiece) by Google which does 
unsupervised training and does not assume anything about the language 
(whitespaces), you can just give it things to read.

Reply via email to