Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread NARUSE, Yui
Ian Hickson wrote: Authors should not use JIS-X-0208 (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on ISO-2022, and encodings based on EBCDIC. It is not clear what this means (e.g., the character set JIS_C6226-1983 in any encoding, or only when encoded alone according to

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Ian Hickson
On Fri, 23 Oct 2009, NARUSE, Yui wrote: The exact string isn't there, that's why I included the preferred MIME names in brackets in the spec. if it is talking about character encodings, why it uses the name of character sets mainly? Following seems better. Authors should not use

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Øistein E . Andersen
On 23 Oct 2009, at 04:20, Ian Hickson wrote: On Wed, 21 Oct 2009, Øistein E. Andersen wrote: ASCII-compatibility: The note in ‘2.1.5 Character encodings’ seems to say that [...] ISO-2022’[-*] are ASCII-compatible, whereas HZ-GB-2312 is not, and I cannot find anything in Section 2.1.5

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Ian Hickson
On Fri, 23 Oct 2009, �istein E. Andersen wrote: On 23 Oct 2009, at 04:20, Ian Hickson wrote: On Wed, 21 Oct 2009, Øistein E. Andersen wrote: ASCII-compatibility: The note in ‘2.1.5 Character encodings’ seems to say that [...] ISO-2022’[-*] are ASCII-compatible, whereas HZ-GB-2312 is

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread NARUSE, Yui
Øistein E. Andersen wrote: Discouraged encodings: ‘4.2.5.5 Specifying the document's character encoding’ advises against certain encodings. (Incidentally, this advice probably deserves not to be ‘hidden’ in a section nominally reserved for character encoding *declaration* issues.) In

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread Øistein E . Andersen
On 22 Oct 2009, at 17:15, NARUSE, Yui wrote: First, JIS-X-0208 and JIS-X-0212 are not in IANA Charsets, I am not sure what you mean; they are both listed at http://www.iana.org/assignments/character-sets: Name: JIS_C6226-1983 [RFC1345,KXS2] MIBenum: 63

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread NARUSE, Yui
Øistein E. Andersen wrote: On 22 Oct 2009, at 17:15, NARUSE, Yui wrote: First, JIS-X-0208 and JIS-X-0212 are not in IANA Charsets, I am not sure what you mean; they are both listed at http://www.iana.org/assignments/character-sets: Name: JIS_C6226-1983

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread Philip Taylor
On Thu, Oct 22, 2009 at 9:23 PM, Øistein E. Andersen li...@coq.no wrote: On 22 Oct 2009, at 17:15, NARUSE, Yui wrote: Finally, Why ISO 2022 series is discouraged is not clear. We agree on this point. The string 숍訊昱穿 encoded as ISO-2022-KR is the bytes 0e 3c 73 63 72 69 70 74 3e. A UA that

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread Øistein E . Andersen
On 22 Oct 2009, at 22:45, Philip Taylor wrote: On Thu, Oct 22, 2009 at 9:23 PM, Øistein E. Andersen li...@coq.no wrote: On 22 Oct 2009, at 17:15, NARUSE, Yui wrote: Finally, Why ISO 2022 series is discouraged is not clear. We agree on this point. The string 숍訊昱穿 encoded as ISO-2022-KR is the

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread Ian Hickson
On Wed, 21 Oct 2009, �istein E. Andersen wrote: ASCII-compatibility: The note in �2.1.5 Character encodings� seems to say that �variants of ISO-2022� (presumably including common ones like ISO-2022-CN, ISO-2022KR and ISO-2022-JP) are ASCII-compatible, whereas HZ-GB-2312 is not, and I cannot

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-21 Thread Øistein E . Andersen
On 19 Oct 2009, at 05:52, Ian Hickson wrote: I've noted your e-mail here [...] and moved the whole thing out of the spec. That does not seem to apply to the last part of the original e-mail, quoted below. Øistein E. Andersen Other character encoding issues:

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-18 Thread Ian Hickson
On Sat, 18 Jul 2009, Øistein E. Andersen wrote: On 7 Jul 2009, at 09:25, Ian Hickson wrote: On Tue, 9 Jun 2009, Anne van Kesteren wrote: [S]hould HTML5 mention that Windows-932 maps to Windows-31J? (It does not appear in the IANA registry.) I've added this mapping too, just in case.

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-07-17 Thread Øistein E . Andersen
On 7 Jul 2009, at 09:25, Ian Hickson wrote: On Tue, 9 Jun 2009, Anne van Kesteren wrote: [S]hould HTML5 mention that Windows-932 maps to Windows-31J? (It does not appear in the IANA registry.) I've added this mapping too, just in case. Added x-sjis. What are the other mappings that would

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-07-07 Thread Ian Hickson
On Tue, 9 Jun 2009, Anne van Kesteren wrote: On Tue, 09 Jun 2009 01:42:57 +0200, Øistein E. Andersen li...@coq.no wrote: Le 5 juin 09, Anne van Kesteren écrivit : Is the implication here that Shift_JIS and Shift-JIS are distinct [...]? No, Shift-JIS and Windows-932 are commonly used

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-11 Thread Øistein E . Andersen
Le 10 juin 09 à 09:06, Anne van Kesteren a écrit : It is about adding aliases. If the alias added is also a distinct encoding conformance checkers are supposed to report on the differences. That probably has to be made more explicit, then. Personally I would be happy with making the

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-09 Thread Øistein E . Andersen
Le 3 juin 09 à 23h19, Ian Hickson écrivit : On Tue, 14 Apr 2009, Øistein E. Andersen wrote: HTML5 currently contains a table of encodings aliases, [...] GB2312 and GB_2312-80 technically refer to the *character set* GB 2312-80, [...]. GBK, on the other hand, is an encoding. [...] There is

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-08 Thread Øistein E . Andersen
On Tue, 14 Apr 2009, Øistein E. Andersen wrote: Shift_JIS Windows-31J [...] Shift-JIS Windows-932 Le 5 juin 09, Anne van Kesteren écrivit : Is the implication here that Shift_JIS and Shift-JIS are distinct [...]? No, Shift-JIS and Windows-932 are commonly used names/labels for the

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-05 Thread Anne van Kesteren
Is the implication here that Shift_JIS and Shift-JIS are distinct despite the encoding matching rules in Unicode not allowing for that? If that is the case I think we need new matching rules. If the implication is something else I'd like to know. On Thu, 04 Jun 2009 00:19:05 +0200, Ian

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-05 Thread Ian Hickson
On Fri, 5 Jun 2009, Anne van Kesteren wrote: Is the implication here that Shift_JIS and Shift-JIS are distinct despite the encoding matching rules in Unicode not allowing for that? If that is the case I think we need new matching rules. If the implication is something else I'd like to

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-05 Thread Ian Hickson
On Fri, 5 Jun 2009, Anne van Kesteren wrote: On Fri, 05 Jun 2009 10:14:46 +0200, Ian Hickson i...@hixie.ch wrote: On Fri, 5 Jun 2009, Anne van Kesteren wrote: Is the implication here that Shift_JIS and Shift-JIS are distinct despite the encoding matching rules in Unicode not allowing for

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-03 Thread Ian Hickson
I haven't made any changes to the spec based on the feedback below. Let me know if there's anything I missed. I'm not aware of any specific problems at this time. On Sat, 11 Apr 2009, Øistein E. Andersen wrote: On 22 May 2008, at 12:40, Ian Hickson wrote: Do you have input on the EUC-JP

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-03 Thread Ian Hickson
On Sun, 12 Apr 2009, Øistein E. Andersen wrote: On 2 Sep 2008, at 06:06, Ian Hickson wrote: On Wed, 30 Jul 2008, Øistein E. Andersen wrote: 1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252. IE7, on the other hand, simply ignores the high bit (as it does for

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-03 Thread Ian Hickson
On Tue, 14 Apr 2009, Øistein E. Andersen wrote: This e-mail is an attempt to give a relatively concise yet reasonably complete overview of non-Unicode character sets and encodings for `Chinese characters', excluding those which are not supported by at least one of the four browsers IE,

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-04-13 Thread Øistein E . Andersen
This e-mail is an attempt to give a relatively concise yet reasonably complete overview of non-Unicode character sets and encodings for `Chinese characters', excluding those which are not supported by at least one of the four browsers IE, Safari, Firefox and Opera (henceforth `all

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-04-12 Thread Øistein E . Andersen
On 2 Sep 2008, at 06:06, Ian Hickson wrote: On Wed, 30 Jul 2008, Øistein E. Andersen wrote: 1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252. IE7, on the other hand, simply ignores the high bit (as it does for a few other 7-bit encodings, by the way). Perhaps this

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-04-11 Thread Øistein E . Andersen
On 22 May 2008, at 12:40, Ian Hickson wrote: Do you have input on the EUC-JP issue? I am now about to finish my analysis of CJK encodings (e-mail forthcoming), including EUC-JP. This encoding does not seem to be particularly problematic, however. Are you referring to a specific

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-09-01 Thread Ian Hickson
On Wed, 30 Jul 2008, �istein E. Andersen wrote: The current table seems to cover the mappings between different common compatible 8-bit encodings as implemented in IE7, yes. The table at http://coq.no/character-tables/mime/en gives a bit more detail, most of which is better kept outside

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-07-29 Thread Øistein E . Andersen
On 22 May 2008, at 12:40, Ian Hickson wrote: would you say that what the spec says now is what browsers implement? What should we change? The current table seems to cover the mappings between different common compatible 8-bit encodings as implemented in IE7, yes. The table at

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-05-22 Thread Ian Hickson
On Thu, 13 Mar 2008, �istein E. Andersen wrote: On 5th June 2007, Øistein E. Andersen wrote: (To do this properly, what we really ought to do is look for C1 and undefined characters in all IANA charsets and semi-official mappings to Unicode and check 1) whether the gaps can be filled by

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-03-16 Thread Øistein E . Andersen
Krzysztof Żelechowski wrote: Some characters, like digits, are direction-transparent [...] Inserting an LTR mark before them makes them LTR. Thanks. I would have preferred a solution which did not involve inserting extraneous characters, but I have now added LTR marks to fix the rendering.

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-03-13 Thread Krzysztof Żelechowski
Dnia 13-03-2008, Cz o godzinie 02:04 +0100, Øistein E. Andersen pisze: PPS: Some right-to-left characters contaminate surrounding characters as I have not yet found a simple solution to make everything strictly left-to-right (probably because I have not looked for it properly). Some

[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-03-12 Thread Øistein E . Andersen
On 5th June 2007, Øistein E. Andersen wrote: (To do this properly, what we really ought to do is look for C1 and undefined characters in all IANA charsets and semi-official mappings to Unicode and check 1) whether the gaps can be filled by borrowing from other encodings, and 2) whether