Great. Yes, the data files are here:
https://unicode.org/reports/tr41/tr41-26.html#Props0

I’ve done a proof of concept here: https://github.com/clipperhouse/uax29

To do it properly, I assume we’d want to use the house style here?
https://github.com/golang/text/blob/master/unicode/rangetable/gen.go

On Thu, Apr 16, 2020 at 1:52 PM <m...@golang.org> wrote:

> Yes that would be interesting. Especially if it can be generated from the
> Unicode raw data upon updates.
>
> On Wed, 15 Apr 2020 at 23:56 Ian Lance Taylor <i...@golang.org> wrote:
>
>> [ +mpvl ]
>>
>> On Wed, Apr 15, 2020 at 2:30 PM Matt Sherman <mwsher...@gmail.com> wrote:
>> >
>> > Hi, I am working on a tokenizer based on Unicode text segmentation (UAX
>> 29). I am wondering if there would be an interest in adding range tables
>> for word break categories to the x/text or unicode packages. It appears
>> they could be code-gen’d alongside the rest of the range tables.
>> >
>> > Pardon if this is already being done and I have missed it. I see some
>> mention of those categories (e.g. ALetter) in other places.
>> >
>> > My code is here. Thanks.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "golang-nuts" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to golang-nuts+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/golang-nuts/2a058556-da51-46d0-a41b-28e323541332%40googlegroups.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAMPnbukLN%3DSVkhBQ1TM8TYfp-t1Z3Wxc6MuAi6UZFYYnumU3rw%40mail.gmail.com.

Reply via email to