[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873690#comment-16873690
 ] 

David Mollitor commented on ZOOKEEPER-3342:
-------------------------------------------

Java also historically has used the same encoding as the one your presented. 
Regardless, UTF-8 can capture all UTF-16 values (and then some). Like all 
things Java, the character encoding works correctly across platforms.

https://softwareengineering.stackexchange.com/questions/174947/why-does-java-use-utf-16-for-internal-string-representation

I wouldn't recommend making the character encoding a configurable option.

There's no way currently to record the encoding used in all the various places 
ZK, so if the default changes between server restarts, reading a snapshot, 
reading a ZNode name, reading a ZNode value, etc. may break.
Allowing for a configurable character encoding will explode the test metric for 
ZK. Using UTF-8, which covers pretty much every language, will keep the testing 
in-check.
Since we're changing to UTF-8, which is most permissive, the chance of a 
backwards capability issue is very low.

http://utf8everywhere.org/

> Use StandardCharsets
> --------------------
>
>                 Key: ZOOKEEPER-3342
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3342
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {quote}
> Encodes this String into a sequence of bytes using the platform's default 
> charset, storing the result into a new byte array. The behavior of this 
> method when this string cannot be encoded in the default charset is 
> unspecified.
> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes--
> {quote}
> Since this is a distributed system, it is always possible that different 
> nodes have different default charsets defined. I think it's most safe to 
> specify it explicitly across all nodes for safety sake. You could for example 
> see a situation where an upgrade JVM uses a different default and during a 
> rolling upgrade of the JVM, different nodes now have a different default.
> * The default charset is usually "ISO-8859-1". UTF-8 covers more of our 
> international friends.
> * Explicitly specifying the CharSet yields slight performance gains
> * Explicitly specifying the CharSet removes the need for try/catch blocks of 
> UnsupportedEncodingException
> https://blog.codecentric.de/en/2014/04/faster-cleaner-code-since-java-7/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to