stevengj edited a comment on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087


   > It seems utf8proc (incorrectly?) claims some undefined codepoints (e.g. 
https://www.compart.com/en/unicode/U+08BE) are UTF8PROC_CATEGORY_LO (General 
category Letter Other).
   
   [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was 
defined in Unicode 13, and category Lo is correct.   It sounds like you may be 
looking at obsolete Unicode tables?
   
   > utf8proc doesn't store and expose the information if a codepoint is of a 
Numeric type
   
   Can't you use the Unicode category (N*) for this?  That's [what Julia 
does](https://github.com/JuliaLang/julia/blob/master/base/strings/unicode.jl#L405).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to