Omega359 opened a new issue, #9053:
URL: https://github.com/apache/arrow-datafusion/issues/9053
### Describe the bug
Upper and lower should function against the unicode code properties for
their respective case. I believe this boils down to using
string.to_ascii_lowercase() and string.to_ascii_uppercase() vs
string.to_lowercase() and string.to_uppercase()
If you use a unicode LC_CTYPE in postgres (not C) then the corresponding
calls will properly respect the unicode code properties.
### To Reproduce
❯ select upper('árvore ação αβγ');
+--------------------------------+
| upper(Utf8("árvore ação αβγ")) |
+--------------------------------+
| áRVORE AçãO αβγ |
+--------------------------------+
❯ select lower('ÁRVORE AÇÃO ΑΒΓ');
+--------------------------------+
| lower(Utf8("ÁRVORE AÇÃO ΑΒΓ")) |
+--------------------------------+
| Árvore aÇÃo ΑΒΓ |
+--------------------------------+
### Expected behavior
upper and lower respect the unicode code maps.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]