Hi all, I've recently got an encoding error while using Cp1252 with UTF-8, the string converted from UTF-8 to Cp1252 can not be converted back:
String name1 = new String( new String("兆源").getBytes("UTF-8"), "Cp1252"); String name2 = new String( name1.getBytes("Cp1252"), "UTF-8"); It looks like that there are some incorrect codes in jdk on encoding Cp1252, and the related codes are: 0x83 0x0192 ;Latin Small Letter F With Hook 0x8d 0x008d 0x8f 0x008f 0x90 0x0090 0x9d 0x009d ( from the Cp1252->UTF-8 map in http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt ) After I cloned the repository in http://hg.openjdk.java.net/jdk6/jdk6 and fix these codes in MS1252.java, the encoding error has gone. I guess this is the right place to discuss this problem, and the patch is in the attachment. Anyone with any comment is appreciated. Regards, Eric -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCM/CS/E/MU/P d+(-) s: a- C++ UL$ P+>++ L++ E++ W++ N+ o+>++ K+++ w !O M-(+) V-- PS+ PE+ Y+ PGP++ t? 5? X? R+>* tv@ b++++ DI-- D G++ e++>+++@ h* r !y+ ------END GEEK CODE BLOCK------
# HG changeset patch # User Eric Liang <eric.l.2...@gmail.com> # Date 1314903253 -28800 # Node ID bbb1c9ab9d0b76879b57e4c216a7ccba26facc14 # Parent 95ac2f7ddad0b8350d5ea3aed7d7d028c44396ba fix incorrect codec dictionary in MS1252, that is, for the encoding: Cp1252 diff -r 95ac2f7ddad0 -r bbb1c9ab9d0b src/share/classes/sun/nio/cs/MS1252.java --- a/src/share/classes/sun/nio/cs/MS1252.java Wed Aug 24 15:11:00 2011 +0100 +++ b/src/share/classes/sun/nio/cs/MS1252.java Fri Sep 02 02:54:13 2011 +0800 @@ -92,10 +92,10 @@ private final static String byteToCharTable = - "\u20AC\uFFFD\u201A\u0192\u201E\u2026\u2020\u2021" + // 0x80 - 0x87 - "\u02C6\u2030\u0160\u2039\u0152\uFFFD\u017D\uFFFD" + // 0x88 - 0x8F - "\uFFFD\u2018\u2019\u201C\u201D\u2022\u2013\u2014" + // 0x90 - 0x97 - "\u02DC\u2122\u0161\u203A\u0153\uFFFD\u017E\u0178" + // 0x98 - 0x9F + "\u20AC\u0192\u201A\u0192\u201E\u2026\u2020\u2021" + // 0x80 - 0x87 + "\u02C6\u2030\u0160\u2039\u0152\u008D\u017D\u008F" + // 0x88 - 0x8F + "\u0090\u2018\u2019\u201C\u201D\u2022\u2013\u2014" + // 0x90 - 0x97 + "\u02DC\u2122\u0161\u203A\u0153\u009D\u017E\u0178" + // 0x98 - 0x9F "\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" + // 0xA0 - 0xA7 "\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" + // 0xA8 - 0xAF "\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" + // 0xB0 - 0xB7 @@ -151,9 +151,9 @@ "\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" + "\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F" + "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" + - "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" + - "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" + - "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" + + "\u0000\u0000\u0000\u0000\u0000\u008D\u0000\u008F" + + "\u0090\u0000\u0000\u0000\u0000\u0000\u0000\u0000" + + "\u0000\u0000\u0000\u0000\u0000\u009D\u0000\u0000" + "\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" + "\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" + "\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" +