Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-12 Thread Philip Jägenstedt
On Mon, 09 Apr 2012 03:08:20 +0200, Øistein E. Andersen li...@coq.no wrote: On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote: On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen li...@coq.no wrote: Suggested change: map C6CD to U+5E7A. These are the existing mappings: C6CD =

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-12 Thread Øistein E . Andersen
On 12 Apr 2012, at 08:26, Philip Jägenstedt wrote: Possibly, one could argue that U+2F33 normalizes (NFKC) to U+5E7A, but it's not the only hanzi in HKSCS-2008 that normalizes into something else: That the characters in the above list look slightly different is really a font issue, they

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-10 Thread Øistein E . Andersen
On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote: On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen li...@coq.no wrote: [1] http://coq.no/character-tables/eten1.pdf http://coq.no/character-tables/eten1.js What is the source for the mappings in eten1.pdf? Unihan H was

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-08 Thread Philip Jägenstedt
On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen li...@coq.no wrote: On Fri Apr 6 14:03:22 PDT 2012, Philip Jägenstedt philipj at opera.com wrote: So, http://people.opera.com/philipj/2012/04/06/big5-foolip.txt is the mapping I suggest, with 18594 defined mappings and 1188 U+FFFD.

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-08 Thread Anne van Kesteren
On Sun, 08 Apr 2012 19:03:58 +0200, Philip Jägenstedt phil...@opera.com wrote: Anne, how do you plan to define encoders for tables with duplicate mappings? Have you collected data for what browsers currently do? I have not looked at encoders much. I looked at one encoding briefly (don't

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-08 Thread Øistein E . Andersen
On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote: On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen li...@coq.no wrote: [...] [1] http://coq.no/character-tables/eten1.pdf http://coq.no/character-tables/eten1.js What is the source for the mappings in eten1.pdf? There is no

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-07 Thread Anne van Kesteren
On Fri, 06 Apr 2012 23:03:22 +0200, Philip Jägenstedt phil...@opera.com wrote: So, http://people.opera.com/philipj/2012/04/06/big5-foolip.txt is the mapping I suggest, with 18594 defined mappings and 1188 U+FFFD. Awesome Philip! The specification has been updated accordingly. I guess I

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-07 Thread Øistein E . Andersen
On Fri Apr 6 14:03:22 PDT 2012, Philip Jägenstedt philipj at opera.com wrote: So, http://people.opera.com/philipj/2012/04/06/big5-foolip.txt is the mapping I suggest, with 18594 defined mappings and 1188 U+FFFD. (Second byte 0xA1 appears as 0x7F in the mapping file.) Your table is very

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-07 Thread Øistein E . Andersen
On 7 Apr 2012, at 15:04, Øistein E. Andersen wrote: Suggested reverse mappings: [...] C6DE = U+3003 C6DF = U+4EDD Sorry, these are different from the other C6xx (ETen-1) mappings. Correction: A1B2 = U+3003 C969 = U+4EDD Rationale: These codepoints are part of the original (unextended)

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-06 Thread Philip Jägenstedt
On Wed, 04 Apr 2012 18:05:14 +0200, Anne van Kesteren ann...@opera.com wrote: On Fri, 30 Mar 2012 14:00:38 +0200, Anne van Kesteren ann...@opera.com wrote: Ideally someone does detailed content analysis to figure out what the best path forward is here, though I'm not entirely sure how. I

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-06 Thread Philip Jägenstedt
On Fri, 06 Apr 2012 12:54:53 +0200, Philip Jägenstedt phil...@opera.com wrote: As a starting point for the spec, I suggest taking the intersection of opera-hk, firefox-hk and chrome-hk. I've written a script in https://gitorious.org/whatwg/big5 to generate the mapping that I think makes

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-06 Thread Philip Jägenstedt
On Fri, 06 Apr 2012 15:42:26 +0200, Philip Jägenstedt phil...@opera.com wrote: These are the ranges that need more investigation. Sorry for the monologue, but investigate I did. These are the interesting ones: C6CF = opera-hk: U+FFFD � firefox: U+5EF4 廴 chrome: U+F6DF  firefox-hk:

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-04 Thread Anne van Kesteren
On Fri, 30 Mar 2012 14:00:38 +0200, Anne van Kesteren ann...@opera.com wrote: Ideally someone does detailed content analysis to figure out what the best path forward is here, though I'm not entirely sure how. I still don't know how, but thanks to Simon Pieters I gathered some URLs from

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-03-30 Thread Anne van Kesteren
On Wed, 28 Mar 2012 17:40:58 +0200, Philip Jägenstedt phil...@opera.com wrote: 1. What is the compatible subset of all browsers? 2. Does that subset include anything mapping to the PUA? The range IE and Chrome map to PUA in bytes (lead,trail) is 0x8140 to 0xA0FE and 0xC6A2 to 0xC8FE. The

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-03-29 Thread Anne van Kesteren
On Wed, 28 Mar 2012 17:40:58 +0200, Philip Jägenstedt phil...@opera.com wrote: Making big5 and big5-hkscs aliases sounds like a good idea, on the assumption that big5-hkscs is a pure extension of Big5. I believe they are not, but given that a) Windows treats them identical and b)

[whatwg] Encoding: big5 and big5-hkscs

2012-03-28 Thread Anne van Kesteren
I'm not sure what to do with big5 and big5-hkscs. After generating all possible byte sequences (lead bytes 0x81 to 0xFE, trail bytes 0x40 to 0x7E and 0xA1 to 0xFE) and getting the code points for those in various browsers there does not seem to be that much interoperability.

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-03-28 Thread Anne van Kesteren
On Wed, 28 Mar 2012 12:18:41 +0200, Anne van Kesteren ann...@opera.com wrote: I'm not sure what to do with big5 and big5-hkscs. After generating all possible byte sequences (lead bytes 0x81 to 0xFE, trail bytes 0x40 to 0x7E and 0xA1 to 0xFE) and getting the code points for those in various

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-03-28 Thread Philip Jägenstedt
On Wed, 28 Mar 2012 15:36:35 +0200, Anne van Kesteren ann...@opera.com wrote: On Wed, 28 Mar 2012 12:18:41 +0200, Anne van Kesteren ann...@opera.com wrote: I'm not sure what to do with big5 and big5-hkscs. After generating all possible byte sequences (lead bytes 0x81 to 0xFE, trail bytes