Arina Ielchiieva created DRILL-5477:
---------------------------------------
Summary: String functions (lower, upper, initcap) should work for
UTF-8
Key: DRILL-5477
URL: https://issues.apache.org/jira/browse/DRILL-5477
Project: Apache Drill
Issue Type: Improvement
Components: Functions - Drill
Affects Versions: 1.10.0
Reporter: Arina Ielchiieva
Drill string functions lower / upper / initcap work only for ASCII, but not for
UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to
convert to Unicode characters. Without that encoding, these functions won't
work for Cyrillic, Greek or any other character set with upper/lower
distinctions.
Currently, when user applies these functions for UTF-8, Drill returns the same
value as was given.
Example:
{noformat}
select upper('привет') from (values(1)) -> привет
{noformat}
There is disabled unit test in
https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33
which should be enabled once issue is fixed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)