Re: Any character encoding experts out there?

Vamsavardhana Reddy Thu, 23 Feb 2006 21:02:42 -0800

Hi Rick,

See the byte to character map at http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx .

Char 0u0081 is not mapped to any byte in Cp1252. So String.getBytes("Cp1252") is returning a '?' which has byte value decimal 63. The same will happen result with any char that is not mapped.

Regards,
Vamsi

On 2/23/06, Rick McGuire <[EMAIL PROTECTED]> wrote:

I'm currently trying to sort out a problem with my implementation of the
MimeUtility class in the javamail specs.  For the
encodeWord()/decodeWord() methods, my encoding encodes to the same value
as the Sun implementation, but the decoding is driving me nuts.  I'm
able to successfully decode this into what should be the correct byte[]
array, but when used to instantiate the String value, I'm getting a
bogus character value.

Playing around with this, I've discovered that the problem seems to be
occurring with the String constructor, and can be demonstrated without
using the javamail code at all.  Here is a little snippet that shows the
problem:

       String longString = "Yada, yada\u0081";

        try {
            byte[] bytes = longString.getBytes("Cp1252");      // get
the bytes using CP1252

            String newString = new String(bytes, 0, bytes.length,
"Cp1252");   // create a new string item using the same code page.

            // last char of original is int 129, last char of rebuilt
string is int 63.
            System.out.println(">>>>> original string = " + longString +
" rebuilt string = " + newString);
            System.out.println(">>>>> original string = " +
(int)longString.charAt(longString.length() - 1) + " rebuilt string = " +
(int)newString.charAt(longString.length() - 1));
        } catch (Exception e) {
        }

63 is the last value in the byte array after the getBytes() call, and
the Sun impl of MimeUtility.encodeWord() returns the string
"=?Cp1252?Q?Yada,_yada=3F?=" (0x3F == 63), so the correct value is
getting extracted.  I'm at a loss to figure out why the round trip coded
above is corrupting the data.  What am I missing here?

Rick

Re: Any character encoding experts out there?

Reply via email to