Good catch, Bob.

You are right of course - I missed the lack of encoding parameter in the second call.

The docs say this:

http://developer.android.com/reference/java/lang/String.html#String(byte[])

Converts the byte array to a string using the default encoding as specified by the file.encoding system property. If the system property is not defined, the default encoding is ISO8859_1 (ISO-Latin-1). If 8859-1 is not available, an ASCII encoding is used.

Looks like the encoding is quite likely to be single-byte based, this is implied by the choice of ASCII (not UTF-8) as the fallback.

Then there is the deprecated "public String (byte[] data, int high)", I see it as a failed attempt to fix things in a simple, but incorrect way.

Overall, it looks like the designers of Java did not push for proper and consistent use of character encodings across the board when the language was still young.

Over time, though, the Java standard library evolved to make consistent use of encodings, because only that is guaranteed to give correct results.

-- Kostya


10.11.2010 20:36, Bob Kerns пишет:
It's clearly not logcat that's the issue here, because the two strings
output differently. He's expecting them to be the same for some
reason.

It just now occurs to me that he may be assuming that the the one-
argument version defaults to UTF-8; it defaults to *something*, but
something ill-specified that is probably never UTF-8. That's now now
it's worded, of course, but that's the effect.

I couldn't begin to tell you how many bugs I've tracked down and fixed
in people's code due to this.

On Nov 8, 12:19 pm, Kostya Vasilyev<[email protected]>  wrote:
I wouldn't count on logcat output to be always correct with respect to
localization.

What do you get if you use decoded strings in a TextView (for example)?

-- Kostya

08.11.2010 19:46, Simon MacDonald пишет:









Hi all,
I'm wondering if I found a bug in Android.  When I run this code on my
laptop:
String myData = "hockey,marché,football";
byte[] rawData;
rawData = myData.getBytes("UTF-8");
System.out.println("UTF-8 decoded: "+new String(rawData,"UTF-8"));
System.out.println("Default decoded: "+new String(rawData));
I get the output:
*UTF-8 decoded: hockey,marché,football*
*Default decoded: hockey,marché,football*
However, when I run the same code in an Android application and view
the output it "adb logcat" I get:
*D/FileUtils(  485): UTF-8 decoded: hockey,march�,football*
*D/FileUtils(  485): Default decoded: hockey,march�,football*
I get the same issue if I change the locale of my phone to French
(Canada) as well.  It doesn't seem like French characters are getting
encoded properly.
Any thoughts?
Simon Mac Donald
http://hi.im/simonmacdonald
--
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en
--
Kostya Vasilyev -- WiFi Manager + pretty widget --http://kmansoft.wordpress.com


--
Kostya Vasilyev -- WiFi Manager + pretty widget -- http://kmansoft.wordpress.com

--
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to