On Mon, Oct 29, 2012 at 8:46 PM, Josh Elser <[email protected]> wrote: > +1 Mike. > > 1. It would be hard for me to believe Key/Value are ever handled internally > in terms of Strings, but, if such a case does exist, it would be extremely > prudent to fix. > > 2. FWIW, the Shell does use ISO-8859-1 as its charset which is referenced by > other commands [1,2]. It would be good to double check all of the other > commands.
I'm a bit lost. Any possible Java String can be rendered in UTF-8. So, if you are calling String.getBytes to turn a string into some bytes for some purpose, I think you need UTF-8. On the other hand, as Mike pointed out, new String(somebytes, "utf-8") will destroy data for some byte values that are not, in fact, UTF-8. By why would Accumulo ever need to string-ify some array of bytes of uncertain parentage? > > [1] > https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java > [2] > https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java > > > On 10/29/2012 8:27 PM, Michael Flester wrote: >> >> I agree with Benson entirely with one caveat. It seems to me that there >> might be two categories of things being discussed >> >> 1. User data (keys and values) >> 2. Ancillary things needed for operation of Accumulo (passwords). >> >> These could well be considered separately. Trying to do anything with >> keys and values other than treating them as bytes all of the time >> I find quite scary. >> >> And if this is only being done to satisfy pmd or findbugs, those tools >> can be convinced to modify their reporting about this issue. >> >
