Note that what Android documents as the behavior (including dependency on a system property) is just one form of "you don't know what it will do". The standard Java behavior is documented thus:
"Constructs a new String by decoding the specified array of bytes using the platform's default charset. The length of the new String is a function of the charset, and hence may not be equal to the length of the byte array." "The behavior of this constructor when the given bytes are not valid in the default charset is unspecified. The CharsetDecoder class should be used when more control over the decoding process is required." Note how loosely-specified "the platform's default charset" is, and even when Android narrows the behavior somewhat, it still leaves you not knowing what to expect "file.encoding" to be set to, and what you might break if you dared to change it! This is indeed something that was a gaping flaw in the early history of Java that they cleaned up later, in JDK 1.1. Which was a very long time ago, but it's still causing problems. I could never figure out what planet those hibyte variants originated on. On ours, there is no way to use them to produce any correct behavior, that isn't a completely useless special-case hack. On Nov 10, 12:51 pm, Kostya Vasilyev <[email protected]> wrote: > Good catch, Bob. > > You are right of course - I missed the lack of encoding parameter in the > second call. > > The docs say this: > > http://developer.android.com/reference/java/lang/String.html#String(byte[]) > > > Converts the byte array to a string using the default encoding as > > specified by the file.encoding system property. If the system property > > is not defined, the default encoding is ISO8859_1 (ISO-Latin-1). If > > 8859-1 is not available, an ASCII encoding is used. > > Looks like the encoding is quite likely to be single-byte based, this is > implied by the choice of ASCII (not UTF-8) as the fallback. > > Then there is the deprecated "public String (byte[] data, int high)", I > see it as a failed attempt to fix things in a simple, but incorrect way. > > Overall, it looks like the designers of Java did not push for proper and > consistent use of character encodings across the board when the language > was still young. > > Over time, though, the Java standard library evolved to make consistent > use of encodings, because only that is guaranteed to give correct results. > > -- Kostya > > 10.11.2010 20:36, Bob Kerns пишет: > > > > > > > > > > > It's clearly not logcat that's the issue here, because the two strings > > output differently. He's expecting them to be the same for some > > reason. > > > It just now occurs to me that he may be assuming that the the one- > > argument version defaults to UTF-8; it defaults to *something*, but > > something ill-specified that is probably never UTF-8. That's now now > > it's worded, of course, but that's the effect. > > > I couldn't begin to tell you how many bugs I've tracked down and fixed > > in people's code due to this. > > > On Nov 8, 12:19 pm, Kostya Vasilyev<[email protected]> wrote: > >> I wouldn't count on logcat output to be always correct with respect to > >> localization. > > >> What do you get if you use decoded strings in a TextView (for example)? > > >> -- Kostya > > >> 08.11.2010 19:46, Simon MacDonald пишет: > > >>> Hi all, > >>> I'm wondering if I found a bug in Android. When I run this code on my > >>> laptop: > >>> String myData = "hockey,marché,football"; > >>> byte[] rawData; > >>> rawData = myData.getBytes("UTF-8"); > >>> System.out.println("UTF-8 decoded: "+new String(rawData,"UTF-8")); > >>> System.out.println("Default decoded: "+new String(rawData)); > >>> I get the output: > >>> *UTF-8 decoded: hockey,marché,football* > >>> *Default decoded: hockey,marché,football* > >>> However, when I run the same code in an Android application and view > >>> the output it "adb logcat" I get: > >>> *D/FileUtils( 485): UTF-8 decoded: hockey,march∩┐╜,football* > >>> *D/FileUtils( 485): Default decoded: hockey,march∩┐╜,football* > >>> I get the same issue if I change the locale of my phone to French > >>> (Canada) as well. It doesn't seem like French characters are getting > >>> encoded properly. > >>> Any thoughts? > >>> Simon Mac Donald > >>>http://hi.im/simonmacdonald > >>> -- > >>> You received this message because you are subscribed to the Google > >>> Groups "Android Developers" group. > >>> To post to this group, send email to [email protected] > >>> To unsubscribe from this group, send email to > >>> [email protected] > >>> For more options, visit this group at > >>>http://groups.google.com/group/android-developers?hl=en > >> -- > >> Kostya Vasilyev -- WiFi Manager + pretty widget > >> --http://kmansoft.wordpress.com > > -- > Kostya Vasilyev -- WiFi Manager + pretty widget > --http://kmansoft.wordpress.com -- You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en

