Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

Øistein E . Andersen Thu, 22 Oct 2009 13:25:11 -0700

On 22 Oct 2009, at 17:15, NARUSE, Yui wrote:

First, JIS-X-0208 and JIS-X-0212 are not in IANA Charsets,


I am not sure what you mean; they are both listed at
<http://www.iana.org/assignments/character-sets>:

Name: JIS_C6226-1983                                     [RFC1345,KXS2]
MIBenum: 63
Source: ECMA registry
Alias: iso-ir-87
Alias: x0208
Alias: JIS_X0208-1983
Alias: csISO87JISX0208

Name: JIS_X0212-1990                                     [RFC1345,KXS2]
MIBenum: 98
Source: ECMA registry
Alias: x0212
Alias: iso-ir-159
Alias: csISO159JISX02121990

moreover those correct names as spec are JIS X 0208 and JIS X 0212.

(The IANA registry is internally inconsistent and often disagrees withofficial standards when it comes to capitalisation, dashes/hyphens,underscores and spaces, so it is difficult to get this right. Pleaseexcuse me for not always paying due attention to such details in e-mails. Of course, the specifications should follow either IANA or theofficial standard as appropriate, depending on what it is referring to.)

Second, JIS_C6226-1983, JIS_X0212-1990, and EBCDICs are not
ASCII compatible. So they are out of discouraged; mustn't use.

EBCDIC is clearly not ASCII-compatible and may be unique amongst thecharacter sets in the IANA registry in providing the full ASCIIrepertoire in a different arrangement.

JIS_C6226-1983 and JIS_X0212-1990 as defined in RFC1345 (i.e., ontheir own) do not contain basic ASCII characters at all, so it makeslittle sense to use them for HTML documents without adding ASCII orthe ASCII-based JIS C 6220-1969, which would give something like EUC-JP or ISO-2022-JP. JIS_C6226-1983 contains wide versions of ASCIIcharacters, but those are not interpreted as HTML mark-up (unless I ammistaken). JIS_X0212-1990 does not contain ASCII, kana or basic kanji,so it is of extremely limited usefulness on its own even in a plain-text setting. Warning against completely useless encodings seemspointless.

Many other encodings in the IANA registry are ASCII-incompatible indifferent ways; what I do not understand is what makes the onescurrently mentioned in the HTML5 draft particularly harmful.

Finally, Why ISO 2022 series is discouraged is not clear.


We agree on this point.

Anyway, most of charsets defined RFC 1345 are not clear.
Conversion table between [those charsets and] Unicode is needed.

Quite. Anne van Kesteren, I and several others are currently tryingto document how browsers handle different encodings at<http://wiki.whatwg.org/wiki/Web_Encodings>, and defining mappings toUnicode is one of the goals. Your contribution would be muchappreciated.


--
Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

Reply via email to