Sorry, bad link. Here it is: https://github.com/clipperhouse/uax29
On Thursday, May 7, 2020 at 12:06:18 PM UTC-4, Matt Sherman wrote: > > Hi gophers, I’ve implemented Unicode text segmentation for Go: > https://github.com/clipperhouse/uax29/words > > It tokenizes text into words, sentences or graphemes according to the Unicode > spec <https://unicode.org/reports/tr29/>. I’d been tokenizing text in ad > hoc ways, and then learned that there is a Unicode standard. > > Hopefully useful for you, feedback welcome. (I’m also talking to @mpvl > about how such functionality might be useful in x/text.) > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/bbb890f3-ee1c-41f3-8468-d90b971b1977%40googlegroups.com.