Hi all,
I've recently got an encoding error while using Cp1252 with UTF-8, the
string converted from UTF-8 to Cp1252 can not be converted back:

    String name1 = new String( new String("兆源").getBytes("UTF-8"),
    "Cp1252");
    String name2 = new String( name1.getBytes("Cp1252"), "UTF-8");

It looks like that there are some incorrect codes in jdk on encoding
Cp1252, and the related codes are:

    0x83    0x0192    ;Latin Small Letter F With Hook
    0x8d    0x008d
    0x8f    0x008f
    0x90    0x0090
    0x9d    0x009d

    ( from the Cp1252->UTF-8 map in
    
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
    )

After I cloned the repository in http://hg.openjdk.java.net/jdk6/jdk6
and fix these codes in MS1252.java, the encoding error has gone.

I guess this is the right place to discuss this problem, and the patch
is in the attachment. Anyone with any comment is appreciated.

Regards,
Eric

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCM/CS/E/MU/P d+(-) s: a- C++ UL$ P+>++ L++ E++ W++ N+ o+>++ K+++ w !O
M-(+) V-- PS+ PE+ Y+ PGP++ t? 5? X? R+>* tv@ b++++ DI-- D G++ e++>+++@ h*
r !y+
------END GEEK CODE BLOCK------

# HG changeset patch
# User Eric Liang <eric.l.2...@gmail.com>
# Date 1314903253 -28800
# Node ID bbb1c9ab9d0b76879b57e4c216a7ccba26facc14
# Parent  95ac2f7ddad0b8350d5ea3aed7d7d028c44396ba
fix incorrect codec dictionary in MS1252, that is, for the encoding: Cp1252

diff -r 95ac2f7ddad0 -r bbb1c9ab9d0b src/share/classes/sun/nio/cs/MS1252.java
--- a/src/share/classes/sun/nio/cs/MS1252.java	Wed Aug 24 15:11:00 2011 +0100
+++ b/src/share/classes/sun/nio/cs/MS1252.java	Fri Sep 02 02:54:13 2011 +0800
@@ -92,10 +92,10 @@
 
         private final static String byteToCharTable =
 
-            "\u20AC\uFFFD\u201A\u0192\u201E\u2026\u2020\u2021" +     // 0x80 - 0x87
-            "\u02C6\u2030\u0160\u2039\u0152\uFFFD\u017D\uFFFD" +     // 0x88 - 0x8F
-            "\uFFFD\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +     // 0x90 - 0x97
-            "\u02DC\u2122\u0161\u203A\u0153\uFFFD\u017E\u0178" +     // 0x98 - 0x9F
+            "\u20AC\u0192\u201A\u0192\u201E\u2026\u2020\u2021" +     // 0x80 - 0x87
+            "\u02C6\u2030\u0160\u2039\u0152\u008D\u017D\u008F" +     // 0x88 - 0x8F
+            "\u0090\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +     // 0x90 - 0x97
+            "\u02DC\u2122\u0161\u203A\u0153\u009D\u017E\u0178" +     // 0x98 - 0x9F
             "\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" +     // 0xA0 - 0xA7
             "\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" +     // 0xA8 - 0xAF
             "\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" +     // 0xB0 - 0xB7
@@ -151,9 +151,9 @@
             "\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" +
             "\u0078\u0079\u007A\u007B\u007C\u007D\u007E\u007F" +
             "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
-            "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
-            "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
-            "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
+            "\u0000\u0000\u0000\u0000\u0000\u008D\u0000\u008F" +
+            "\u0090\u0000\u0000\u0000\u0000\u0000\u0000\u0000" +
+            "\u0000\u0000\u0000\u0000\u0000\u009D\u0000\u0000" +
             "\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7" +
             "\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF" +
             "\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7" +

Reply via email to