Hi all,
I've recently got an encoding error while using Cp1252 with UTF-8, the
string converted from UTF-8 to Cp1252 can not be converted back:
String name1 = new String( new String("兆源").getBytes("UTF-8"),
"Cp1252");
String name2 = new String( name1.getBytes("Cp1252"), "UTF-8");
It looks like that there are some incorrect codes in jdk on encoding
Cp1252, and the related codes are:
0x83 0x0192 ;Latin Small Letter F With Hook
0x8d 0x008d
0x8f 0x008f
0x90 0x0090
0x9d 0x009d
( from the Cp1252->UTF-8 map in
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
)
After I cloned the repository in http://hg.openjdk.java.net/jdk6/jdk6
and fix these codes in MS1252.java, the encoding error has gone.
I guess this is the right place to discuss this problem, and the patch
is in the attachment. Anyone with any comment is appreciated.
Regards,
Eric
--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCM/CS/E/MU/P d+(-) s: a- C++ UL$ P+>++ L++ E++ W++ N+ o+>++ K+++ w !O
M-(+) V-- PS+ PE+ Y+ PGP++ t? 5? X? R+>* tv@ b++++ DI-- D G++ e++>+++@ h*
r !y+
------END GEEK CODE BLOCK------
# HG changeset patch
# User Eric Liang <[email protected]>
# Date 1314903253 -28800
# Node ID bbb1c9ab9d0b76879b57e4c216a7ccba26facc14
# Parent 95ac2f7ddad0b8350d5ea3aed7d7d028c44396ba
fix incorrect codec dictionary in MS1252, that is, for the encoding: Cp1252
diff -r 95ac2f7ddad0 -r bbb1c9ab9d0b src/share/classes/sun/nio/cs/MS1252.java
--- a/src/share/classes/sun/nio/cs/MS1252.java Wed Aug 24 15:11:00 2011 +0100
+++ b/src/share/classes/sun/nio/cs/MS1252.java Fri Sep 02 02:54:13 2011 +0800
@@ -92,10 +92,10 @@
private final static String byteToCharTable =
- "\u20AC\uFFFD\u201A\u0192\u201E\u2026\u2020\u2021" + // 0x80 - 0x87
- "\u02C6\u2030\u0160\u2039\u0152\uFFFD\u017D\uFFFD" + // 0x88 - 0x8F
- "\uFFFD\u2018\u2019\u201C\u201D\u2022\u2013\u2014" + // 0x90 - 0x97
- "\u02DC\u2122\u0161\u203A\u0153\uFFFD\u017E\u0178" + // 0x98 - 0x9F
+ "\u20AC\u0192\u201A\u0192\u201E\u2026\u2020\u2021" + // 0x80 - 0x87
+ "\u02C6\u2030\u0160\u2039\u0152\u008D\u017D\u008F" + // 0x88 - 0x8F
+ "\u0090\u2018\u2019\u201C\u201D\u2022\u2013\u2014" + // 0x90 - 0x97
+ "\u02DC\u2122\u0161\u203A\u0153\u009D\u017E\u0178" + // 0x98 - 0x9F
"\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" + // 0xA0 - 0xA7
"\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" + // 0xA8 - 0xAF
"\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" + // 0xB0 - 0xB7
@@ -151,9 +151,9 @@
"\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" +
"\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F" +
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
- "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
- "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
- "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
+ "\u0000\u0000\u0000\u0000\u0000\u008D\u0000\u008F" +
+ "\u0090\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
+ "\u0000\u0000\u0000\u0000\u0000\u009D\u0000\u0000" +
"\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" +
"\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" +
"\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" +