Don't use "Windows-31J", it is a encoding name alias that is not used by Microsoft for 
its 932 codepage! So it would cause problems with other compliant JVMs.

Better use "CP932" which seems to be the canonical name used by Sun in its reference 
implementation, or "windows-932" documented in the Microsoft codepages documentation, 
and is accepted by the IBM's JVM, and by other Java runtime libraries...

There's an interesting comparison of encoding alias names in IBM's ICU reference docs, 
and even in a runtime ICU table used to disambiguate aliases names according to their 
usage context.
Look at the "icu/source/data/mappings/convrtrs.txt" file in ICU's online CVS 
repository... It lists a lot of aliases with their prefered usage.
a small part of this file contains:

# CJK encodings

ibm-942_P12A-1999 { UTR22* }    # ibm-942_P120 is a rarely used alternate mapping 
(sjis78 is already old)
                        ibm-942 { IBM* }
                        ibm-932 { IBM }
                        cp932
                        shift_jis78
                        sjis78
                        ibm-942_VSUB_VPUA
                        ibm-932_VSUB_VPUA
                        # Is this "JIS_C6226-1978"?

ibm-943_P14A-1999 { UTR22* }
                        ibm-943 # Leave untagged because this isn't the default
                        Shift_JIS { IANA* MIME* WINDOWS JAVA }
                        MS_Kanji { IANA WINDOWS JAVA }
                        csShiftJIS { IANA WINDOWS JAVA }
                        windows-31j { IANA JAVA } # A further extension of Shift_JIS 
to include NEC special characters (Row 13)
                        csWindows31J { IANA WINDOWS JAVA } # A further extension of 
Shift_JIS to include NEC special characters (Row 13)
                        x-sjis { WINDOWS JAVA }
                        x-ms-cp932 { WINDOWS }
                        cp932 { WINDOWS }
                        windows-932 { WINDOWS* }
                        cp943c { JAVA* }    # This is slightly different, but the 
backslash mapping is the same.
                        ms932
                        pck     # Probably SOLARIS
                        sjis    # This might be for ibm-1351
                        ibm-943_VSUB_VPUA
                        # cp943 # This isn't Windows, and no one else uses it.
                        # IANA says that Windows-31J is an extension to csshiftjis 
ibm-932 

(...)
ibm-33722_P12A-1999 { UTR22* }
                        ibm-33722   # Leave untagged because this isn't the default
                        ibm-5050    # Leave untagged because this isn't the default, 
and yes this alias is correct
                        EUC-JP { IANA MIME* WINDOWS JAVA* }
                        Extended_UNIX_Code_Packed_Format_for_Japanese { IANA* WINDOWS 
JAVA }
                        csEUCPkdFmtJapanese { IANA WINDOWS JAVA }
                        X-EUC-JP { WINDOWS JAVA }   # Japan EUC. x-euc-jp is a MIME 
name
                        eucjis { JAVA }
                        windows-51932 { WINDOWS* }
                        ibm-33722_VPUA
                        IBM-eucJP
(...)
# These were removed due to age, and they are rarely used.
#(...)
#ibm-942_P120-1999 { UTR22* }
#                        #ibm-942 { IBM* }
#                        ibm-942_VASCII_VSUB_VPUA
#                        #ibm-932 { IBM }
#                        ibm-932_VASCII_VSUB_VPUA   # Old s_jis

The relevant prefered aliases for Java are marked with { JAVA* }, and posible other 
aliases for Java are marked with { JAVA } without the asterisk.

So "Shift_JIS" is the prefered aliases for IANA and MIME, and a non-prefered but 
recognized alias for WINDOWS and JAVA.
"x-ms-cp932" and "cp932" are used by Windows as aliases for Shift_JIS, but they do not 
designate the same encoding as the one used in Windows.

"windows-51932" (the prefered name in Windows) is the character set which Java and 
MIME preferably designate as "EUC-JP" (this alias is also recognized but not prefered 
by IANA and Windows)

So in Java, I would recommend to use "Shift_JIS" as the base standard, and "EUC-JP" 
for the extension used in Windows codepage 932 in Windows 2000/XP. (the 932 codepage 
is a placeholder in Windows, whose mapping to an effective encoding depends on the OS 
version, in a way similar to the "ANSI" and "OEM" codepages which vary accross 
systems).
The newest 932 codepage is in fact codepage 51932, preferale named "EUC-JP" in Java.

The oldest one is "Shift_JIS" and was mapped to codepage 932, but its usage is not 
recommanded in newer versions of Windows as it is conflicting (some documents created 
on Windows 95/98/ME or NT4 do not show the same character on Windows 2000 and XP!) 
Microsoft changed the mapping but kept the same codepage number, thinking it would be 
easier for users to migrate their systems that use "932" in their batches!

Look above the conflict with the alias name "cp932": the alias to "shift_jis78" exists 
only in ICU but not in IANA, Java, MIME or Windows. On Windows it now designates 
"Shift_JIS" (internally codepage 943 in Windows).

"Windows-31J" also contain proprietary NEC extensions to Shift_JIS, but it is not 
strictly the encoding used in Windows codepage 51932. It is used only on NEC systems 
(including its OEM version of Windows which handle it internally as the codepage 
number 31) and is not recommended for data and application interchange or portability.

-- Philippe.


Reply via email to