Sorry, bad link. Here it is: https://github.com/clipperhouse/uax29

On Thursday, May 7, 2020 at 12:06:18 PM UTC-4, Matt Sherman wrote:
>
> Hi gophers, I’ve implemented Unicode text segmentation for Go: 
> https://github.com/clipperhouse/uax29/words
>
> It tokenizes text into words, sentences or graphemes according to the Unicode 
> spec <https://unicode.org/reports/tr29/>. I’d been tokenizing text in ad 
> hoc ways, and then learned that there is a Unicode standard.
>
> Hopefully useful for you, feedback welcome. (I’m also talking to @mpvl 
> about how such functionality might be useful in x/text.)
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/bbb890f3-ee1c-41f3-8468-d90b971b1977%40googlegroups.com.

Reply via email to