[
https://issues.apache.org/jira/browse/DRILL-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arina Ielchiieva updated DRILL-5477:
------------------------------------
Description:
Drill string functions lower / upper / initcap work only for ASCII, but not for
UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to
convert to Unicode characters. Without that encoding, these functions won't
work for Cyrillic, Greek or any other character set with upper/lower
distinctions.
Currently, when user applies these functions for UTF-8, Drill returns the same
value as was given.
Example:
{noformat}
select upper('привет') from (values(1)) -> привет
{noformat}
There is disabled unit test in
https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33
which should be enabled once issue is fixed.
Please note, by default Calcite does not allow to use UTF-8. Update system
property *saffron.default.charset* to *UTF-16LE* if you encounter the following
error:
{noformat}
org.apache.drill.exec.rpc.RpcException:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
CalciteException: Failed to encode 'привет' in character set 'ISO-8859-1'
{noformat}
was:
Drill string functions lower / upper / initcap work only for ASCII, but not for
UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to
convert to Unicode characters. Without that encoding, these functions won't
work for Cyrillic, Greek or any other character set with upper/lower
distinctions.
Currently, when user applies these functions for UTF-8, Drill returns the same
value as was given.
Example:
{noformat}
select upper('привет') from (values(1)) -> привет
{noformat}
There is disabled unit test in
https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33
which should be enabled once issue is fixed.
> String functions (lower, upper, initcap) should work for UTF-8
> --------------------------------------------------------------
>
> Key: DRILL-5477
> URL: https://issues.apache.org/jira/browse/DRILL-5477
> Project: Apache Drill
> Issue Type: Improvement
> Components: Functions - Drill
> Affects Versions: 1.10.0
> Reporter: Arina Ielchiieva
>
> Drill string functions lower / upper / initcap work only for ASCII, but not
> for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding
> to convert to Unicode characters. Without that encoding, these functions
> won't work for Cyrillic, Greek or any other character set with upper/lower
> distinctions.
> Currently, when user applies these functions for UTF-8, Drill returns the
> same value as was given.
> Example:
> {noformat}
> select upper('привет') from (values(1)) -> привет
> {noformat}
> There is disabled unit test in
> https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33
> which should be enabled once issue is fixed.
> Please note, by default Calcite does not allow to use UTF-8. Update system
> property *saffron.default.charset* to *UTF-16LE* if you encounter the
> following error:
> {noformat}
> org.apache.drill.exec.rpc.RpcException:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> CalciteException: Failed to encode 'привет' in character set 'ISO-8859-1'
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)