Great. Yes, the data files are here: https://unicode.org/reports/tr41/tr41-26.html#Props0
I’ve done a proof of concept here: https://github.com/clipperhouse/uax29 To do it properly, I assume we’d want to use the house style here? https://github.com/golang/text/blob/master/unicode/rangetable/gen.go On Thu, Apr 16, 2020 at 1:52 PM <m...@golang.org> wrote: > Yes that would be interesting. Especially if it can be generated from the > Unicode raw data upon updates. > > On Wed, 15 Apr 2020 at 23:56 Ian Lance Taylor <i...@golang.org> wrote: > >> [ +mpvl ] >> >> On Wed, Apr 15, 2020 at 2:30 PM Matt Sherman <mwsher...@gmail.com> wrote: >> > >> > Hi, I am working on a tokenizer based on Unicode text segmentation (UAX >> 29). I am wondering if there would be an interest in adding range tables >> for word break categories to the x/text or unicode packages. It appears >> they could be code-gen’d alongside the rest of the range tables. >> > >> > Pardon if this is already being done and I have missed it. I see some >> mention of those categories (e.g. ALetter) in other places. >> > >> > My code is here. Thanks. >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "golang-nuts" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to golang-nuts+unsubscr...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/2a058556-da51-46d0-a41b-28e323541332%40googlegroups.com >> . >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMPnbukLN%3DSVkhBQ1TM8TYfp-t1Z3Wxc6MuAi6UZFYYnumU3rw%40mail.gmail.com.