[go-nuts] x/text: Interest in Unicode text segmentation?

Matt Sherman Wed, 15 Apr 2020 14:31:09 -0700

Hi, I am working on a tokenizer based on Unicode text segmentation (UAX 29 
<https://unicode.org/reports/tr29/#Word_Boundaries>). I am wondering if 
there would be an interest in adding range tables for word break categories 
<https://unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakProperty.txt> to 
the x/text or unicode packages. It appears they could be code-gen’d 
alongside the rest of the range tables.


Pardon if this is already being done and I have missed it. I see some 
mention <https://github.com/golang/text/search?q=ALetter&unscoped_q=ALetter> of 
those categories (e.g. ALetter) in other places.

My code is here <https://github.com/clipperhouse/uax29>. Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/2a058556-da51-46d0-a41b-28e323541332%40googlegroups.com.

[go-nuts] x/text: Interest in Unicode text segmentation?

Reply via email to