[
https://issues.apache.org/jira/browse/IMPALA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900176#comment-16900176
]
Tim Armstrong commented on IMPALA-2019:
---------------------------------------
The docs also state that the character set is ASCII - i.e. a single byte
character set -
https://impala.apache.org/docs/build/html/topics/impala_string.html. As I said,
this is the expected behaviour for now and changing the default would be a
breaking change.
> Proper UTF-8 support in string functions
> ----------------------------------------
>
> Key: IMPALA-2019
> URL: https://issues.apache.org/jira/browse/IMPALA-2019
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: Impala 2.1, Impala 2.2
> Reporter: Andrés Cordero
> Priority: Minor
> Labels: sql-language
>
> As documented here:
> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_string.html
> Impala does not properly handle non-ASCII UTF-8 characters, and will return
> results in string functions such as length that are inconsistent with Hive.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]