[
https://issues.apache.org/jira/browse/ACCUMULO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489228#comment-13489228
]
Josh Elser commented on ACCUMULO-836:
-------------------------------------
*GrepIterator*: It should be noted (javadoc) that the String being converted to
bytes will be treated as UTF-8 encoded bytes or not make the UTF-8 assertion at
all.
*MetadataTable#encode(), DistributedReadWriteLock#getLockData()*: Should note
that the byte[] return from the specified method is utf-8 bytes.
*LongCombiner.StringEncoder, StringMax, StringMin, StringSummation,
SummingArrayCombiner.StringArrayEncoder, Authorizations,
Master#mergeMetadataRecords*: These classes are creating bytes that are UTF-8,
but when the bytes are initially read into a String (from a Value typically),
the default encoding is used (String constructor that takes a byte array). This
leads to inconsistency as the data could have been read as something other than
UTF-8 but then written back out as UTF-8. A decision needs to make what to do
and that decision needs to be documented.
*ZooStore*: Some awkwardness pops out at me in #setProperty(long, String,
Serializable) manually adding bytes to the data to be written to ZooKeeper. I
don't think UTF-8 will cause any problems, but it could definitely use some
clarification.
*TraceServer.Receiver, IndexMeta, AddFilesWithMissingEntries, MetadataTable*:
Writes out a Value in utf-8 bytes, but I'm not confident if there is any case
in which a client reading that data would expect something else. Documentation
again would be useful. The likelihood of this being an issue is probably small
considering that Hadoop's WritableUtils encodes Strings as UTF-8.
I'm still a little concerned about access points to ZooKeeper and !METADATA,
but given that ZooReaderWriter was converting the username and password as
UTF-8 bytes I feel slightly better. I should dig into that code more tomorrow.
One final statement, I still believe that in the ambiguous cases where core
classes read arbitrary bytes and write UTF-8 bytes, Accumulo should be agnostic
and not make encoding assertions. In other words, I think we should revert
those changes and leave it up to the user to decide how they handle their bytes.
> Specify Charset on getBytes() call for String objects.
> ------------------------------------------------------
>
> Key: ACCUMULO-836
> URL: https://issues.apache.org/jira/browse/ACCUMULO-836
> Project: Accumulo
> Issue Type: Improvement
> Affects Versions: 1.5.0
> Reporter: David Medinets
> Assignee: David Medinets
> Priority: Minor
> Fix For: 1.5.0
>
> Attachments: UTF8.java
>
>
> The comments on ACCUMULO-241 indicate that the build server might have a
> different default Charset than computers used by developers. Therefore, some
> of the tests have different results on different computers.
> Every getBytes call on a String object should specify the UTF8 Charset.
> Unfortunately the codebase has nearly 1,800 getBytes calls.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira