[ 
https://issues.apache.org/jira/browse/IMPALA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900176#comment-16900176
 ] 

Tim Armstrong commented on IMPALA-2019:
---------------------------------------

The docs also state that the character set is ASCII - i.e. a single byte 
character set - 
https://impala.apache.org/docs/build/html/topics/impala_string.html. As I said, 
this is the expected behaviour for now and changing the default would be a 
breaking change.

> Proper UTF-8 support in string functions
> ----------------------------------------
>
>                 Key: IMPALA-2019
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2019
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: Impala 2.1, Impala 2.2
>            Reporter: Andrés Cordero
>            Priority: Minor
>              Labels: sql-language
>
> As documented here: 
> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_string.html
> Impala does not properly handle non-ASCII UTF-8 characters, and will return 
> results in string functions such as length that are inconsistent with Hive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to