[ https://issues.apache.org/jira/browse/ZOOKEEPER-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873690#comment-16873690 ]
David Mollitor commented on ZOOKEEPER-3342: ------------------------------------------- Java also historically has used the same encoding as the one your presented. Regardless, UTF-8 can capture all UTF-16 values (and then some). Like all things Java, the character encoding works correctly across platforms. https://softwareengineering.stackexchange.com/questions/174947/why-does-java-use-utf-16-for-internal-string-representation I wouldn't recommend making the character encoding a configurable option. There's no way currently to record the encoding used in all the various places ZK, so if the default changes between server restarts, reading a snapshot, reading a ZNode name, reading a ZNode value, etc. may break. Allowing for a configurable character encoding will explode the test metric for ZK. Using UTF-8, which covers pretty much every language, will keep the testing in-check. Since we're changing to UTF-8, which is most permissive, the chance of a backwards capability issue is very low. http://utf8everywhere.org/ > Use StandardCharsets > -------------------- > > Key: ZOOKEEPER-3342 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3342 > Project: ZooKeeper > Issue Type: Improvement > Components: server > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {quote} > Encodes this String into a sequence of bytes using the platform's default > charset, storing the result into a new byte array. The behavior of this > method when this string cannot be encoded in the default charset is > unspecified. > https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes-- > {quote} > Since this is a distributed system, it is always possible that different > nodes have different default charsets defined. I think it's most safe to > specify it explicitly across all nodes for safety sake. You could for example > see a situation where an upgrade JVM uses a different default and during a > rolling upgrade of the JVM, different nodes now have a different default. > * The default charset is usually "ISO-8859-1". UTF-8 covers more of our > international friends. > * Explicitly specifying the CharSet yields slight performance gains > * Explicitly specifying the CharSet removes the need for try/catch blocks of > UnsupportedEncodingException > https://blog.codecentric.de/en/2014/04/faster-cleaner-code-since-java-7/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)