jorisvandenbossche commented on issue #34599:
URL: https://github.com/apache/arrow/issues/34599#issuecomment-1473321273

   Arrow uses the `utf8proc` C library for UTF8 operations 
(https://juliastrings.github.io/utf8proc/). 
   
   And this library changed the upper case for "ß" from "SS" to "ẞ" a few years 
ago: https://github.com/JuliaStrings/utf8proc/issues/130
   
   It seems that there is some discussion about what the correct upper case 
should be. For example, see also https://bugs.openjdk.org/browse/JDK-8186073 . 
The unicode standard (http://unicode.org/charts/PDF/U1E00.pdf) mentions:
   
   > The capital letter sharp s is part of the official German
   orthography since 2017. Along with "SS" it is an allowed
   variant spelling of 00DF in "all caps" style
   
   https://www.fileformat.info/info/unicode/char/00df/index.htm mentions 
*"uppercase is "SS" (standard case mapping), alternatively 
[U+1E9E](https://www.fileformat.info/info/unicode/char/1e9e/index.htm)"*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to