[ 
https://issues.apache.org/jira/browse/DRILL-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5477:
---------------------------------
    Labels: doc-impacting  (was: )

> String functions (lower, upper, initcap) should work for UTF-8
> --------------------------------------------------------------
>
>                 Key: DRILL-5477
>                 URL: https://issues.apache.org/jira/browse/DRILL-5477
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>    Affects Versions: 1.10.0
>            Reporter: Arina Ielchiieva
>              Labels: doc-impacting
>
> Drill string functions lower / upper / initcap work only for ASCII, but not 
> for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding 
> to convert to Unicode characters. Without that encoding, these functions 
> won't work for Cyrillic, Greek or any other character set with upper/lower 
> distinctions.
> Currently, when user applies these functions for UTF-8, Drill returns the 
> same value as was given.
> Example:
> {noformat}
> select upper('привет') from (values(1)) -> привет
> {noformat}
> There is disabled unit test in 
> https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33
>  which should be enabled once issue is fixed.
> Please note, by default Calcite does not allow to use UTF-8. Update system 
> property *saffron.default.charset* to *UTF-16LE* if you encounter the 
> following error:
> {noformat}
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> CalciteException: Failed to encode 'привет' in character set 'ISO-8859-1'
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to