Re: Code pages and Unicode

2011-08-25 Thread Asmus Freytag
On 8/24/2011 7:45 PM, Richard Wordingham wrote: Which earlier coding system supported Welsh? (I'm thinking of 'W WITH CIRCUMFLEX', U+0174 and U+0175.) How was the use of the canonical decompositions incompatible with the character encodings of legacy systems? Latin-1 has the same codes as

RE: Code pages and Unicode

2011-08-25 Thread Erkki I Kolehmainen
] Puolesta Asmus Freytag Lähetetty: 25. elokuuta 2011 9:00 Vastaanottaja: Richard Wordingham Kopio: Ken Whistler; unicode@unicode.org Aihe: Re: Code pages and Unicode On 8/24/2011 7:45 PM, Richard Wordingham wrote: Which earlier coding system supported Welsh? (I'm thinking of 'W WITH CIRCUMFLEX

RE: Code pages and Unicode

2011-08-24 Thread William_J_G Overington
On Tuesday 23 August 2011, Doug Ewell d...@ewellic.org wrote: Asmus Freytag asmusf at netcom dot com wrote: Until then, I find further speculation rather pointless and would love if it moved off this list (until such time). +1 -0.7 It is harmless fun, indeed it is fun that assists

Re: Re: Code pages and Unicode

2011-08-24 Thread Jean-François Colson
On 23 août 2011 21:44 Richard Wordingham richard.wording...@ntlworld.com richard.wording...@ntlworld.com wrote: On Tue, 23 Aug 2011 07:18:21 +0200 Jean-François Colson j...@colson.eu j...@colson.eu wrote: And what dou you think about (H1,H2,VS1,L3,L4)? The L4 is unnecessary. The trick

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: (1) a plain-text file (2) using only plain-text conventions (i.e. not adding rich text) (3) which contains the same PUA code point with two meanings (4) using different fonts or other mechanisms (5) in a platform-independent,

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Doug Ewell
Luke-Jr luke at dashjr dot org wrote: Too bad the Conscript registry is censoring assignments the maintainer doesn't like for unspecified personal reasons, increasing the chances of an overlap. This isn't censorship, which would imply some sort of political, ethical, or moral agenda. This is

Re: Code pages and Unicode

2011-08-24 Thread John H. Jenkins
Asmus Freytag 於 2011年8月23日 下午2:00 寫道: Until then, I find further speculation rather pointless and would love if it moved off this list (until such time). That would be wonderful, because we could then turn our attention to more urgent subjects, such as what to do when the sun reaches

RE: Code pages and Unicode

2011-08-24 Thread Doug Ewell
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Until then, I find further speculation rather pointless and would love if it moved off this list (until such time). It is harmless fun, indeed it is fun that assists learning and understanding, and so as long as it

Re: Code pages and Unicode

2011-08-24 Thread Richard Wordingham
On Wed, 24 Aug 2011 08:02:42 -0700 Doug Ewell d...@ewellic.org wrote: But some people seem to be dead serious about the need to go beyond 1.1 million code points, and are making dead-serious arguments that we need to plan for it. Those are two different claims. 'Never say never' is a useful

Re: Code pages and Unicode

2011-08-24 Thread Ken Whistler
On 8/24/2011 10:48 AM, Richard Wordingham wrote: Those are two different claims. 'Never say never' is a useful maxim. So is Leave well enough alone. The problem would be in using maxims instead of an analysis of engineering requirements to drive architectural decisions. The extension of

Re: Code pages and Unicode

2011-08-24 Thread Richard Wordingham
On Wed, 24 Aug 2011 12:40:54 -0700 Ken Whistler k...@sybase.com wrote: On 8/24/2011 10:48 AM, Richard Wordingham wrote: if, say, code points are squandered. Oh. Well, in that case, the correct action is to work to ensure that code points are not squandered. Have there not already

Re: Code pages and Unicode

2011-08-24 Thread John H. Jenkins
It has ceased to be. It's expired and gone to meet its maker. It's a stiff. Bereft of life, it rests in peace.…Its metabolic processes are now history. It's off the twig. It's kicked the bucket, it's shuffled off its mortal coil, run down the curtain and joined the bleedin' choir invisible.

Re: Code pages and Unicode

2011-08-24 Thread Ken Whistler
On 8/24/2011 3:51 PM, Richard Wordingham wrote: Well, in that case, the correct action is to work to ensure that code points are not squandered. Have there not already been several failures on that front? The BMP is littered with concessions to the limitations of rendering systems -

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Philippe Verdy
2011/8/24 Doug Ewell d...@ewellic.org: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: (1) a plain-text file (2) using only plain-text conventions (i.e. not adding rich text) (3) which contains the same PUA code point with two meanings (4) using different fonts or other mechanisms

Re: Code pages and Unicode

2011-08-24 Thread Philippe Verdy
2011/8/25 Richard Wordingham richard.wording...@ntlworld.com: It will only happen when the need becomes obvious, which may be never, or may be 30 years hence.  It's even conceivable that UTF-16 will drop out of use. Conceivable but extremely unlikely because it will remain used in extremely

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Doug Ewell
by ATT -Original Message- From: Philippe Verdy verd...@wanadoo.fr Sender: unicode-bou...@unicode.org Date: Thu, 25 Aug 2011 02:10:27 To: Doug Ewelld...@ewellic.org Reply-To: verd...@wanadoo.fr Cc: unicode@unicode.org Subject: Re: Multiple private agreements (was: RE: Code pages and Unicode

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Doug Ewell
Subject: Re: Multiple private agreements (was: RE: Code pages and Unicode) Philippe wrote: But my initial suggestion implied that condition 3 was not part of it. This is not me, but sriva that has modified the problem. The problem was changed later by adding new conditions that I have never intended

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Philippe Verdy
2011/8/24 Doug Ewell d...@ewellic.org: As Richard said, and you probably already know, there is no chance that UTC will ever do anything with the PUA, especially anything that gives the appearance of endorsing its use.  I'm just thankful they haven't deprecated it. The appearance of endorsing

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Andrew Cunningham
so you will end up with the CSUR AND the registry Pilippe is suggesting AND all the existing uses of PUA that will not end up in CSUR or the other registry. sounds like it will be a mess. its bad enough dealing with Unicode and pseudo-Unicode in the Myanmar script, adding PUA potentially into

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-24 Thread Philippe Verdy
2011/8/25 Andrew Cunningham lang.supp...@gmail.com: so you will end up with the CSUR AND the registry Philippe is suggesting AND all the existing uses of PUA that will not end up in CSUR or the other registry. sounds like it will be a mess. its bad enough dealing with Unicode and

Re: Code pages and Unicode

2011-08-23 Thread Richard Wordingham
On Mon, 22 Aug 2011 16:18:56 -0700 Ken Whistler k...@sybase.com wrote: How about Clause 12.5 of ISO/IEC 10646: 001B, 0025, 0040 You escape out of UTF-16 to ISO 2022, and then you can do whatever the heck you want, including exchange and processing of complete 4-byte forms, with all the

Re: Code pages and Unicode

2011-08-23 Thread Asmus Freytag
On 8/23/2011 12:00 PM, Richard Wordingham wrote: On Mon, 22 Aug 2011 16:18:56 -0700 Ken Whistlerk...@sybase.com wrote: How about Clause 12.5 of ISO/IEC 10646: 001B, 0025, 0040 You escape out of UTF-16 to ISO 2022, and then you can do whatever the heck you want, including exchange and

RE: Code pages and Unicode

2011-08-23 Thread Doug Ewell
Asmus Freytag asmusf at netcom dot com wrote: Until then, I find further speculation rather pointless and would love if it moved off this list (until such time). +1 -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­

Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-23 Thread Doug Ewell
srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote: If same codes within PUA becomes standard for different purposes, They aren't standard. Two different private agreements could assign different characters to the same PUA code points. how to get both working using same font? You

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-23 Thread Philippe Verdy
2011/8/23 Doug Ewell d...@ewellic.org: srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote: If same codes within PUA becomes standard for different purposes, They aren't standard.  Two different private agreements could assign different characters to the same PUA code points. how

RE: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-23 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: There's no standard way to specify even one font or private agreement in plain text, let alone how to switch between them within the same document.  This is not an intended use of the PUA. There exists such standard in the context

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-23 Thread Philippe Verdy
2011/8/24 Doug Ewell d...@ewellic.org: Coordinating private agreements so they don't conflict is clearly the ideal situation.  But many different people and organizations have already claimed the same chunk of PUA space, as Richard exemplified yesterday with his Taiwan/Hong Kong example.  

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-23 Thread Luke-Jr
On Tuesday, August 23, 2011 10:29:58 PM Philippe Verdy wrote: 2011/8/24 Doug Ewell d...@ewellic.org: (3) which contains the same PUA code point with two meanings The only numbered item to sacifice is number (3) here. that's the case where separate PUA agreements are still coordinated so that

Re: Multiple private agreements (was: RE: Code pages and Unicode)

2011-08-23 Thread Philippe Verdy
2011/8/24 Luke-Jr l...@dashjr.org: On Tuesday, August 23, 2011 10:29:58 PM Philippe Verdy wrote: 2011/8/24 Doug Ewell d...@ewellic.org: (3) which contains the same PUA code point with two meanings The only numbered item to sacifice is number (3) here. that's the case where separate PUA

Re: Code pages and Unicode

2011-08-22 Thread Andrew West
On 21 August 2011 02:14, Richard Wordingham richard.wording...@ntlworld.com wrote: On Fri, 19 Aug 2011 17:03:41 -0700 Ken Whistler k...@sybase.com wrote: O.k., so apparently we have awhile to go before we have to start worrying about the Y2K or IPv4 problem for Unicode. Call me again in the

Re: Code pages and Unicode

2011-08-22 Thread Shriramana Sharma
On 08/22/2011 03:05 PM, Andrew West wrote: Can anyone think of a way to extend UTF-16 without adding new surrogates or inventing a new general category? Why would anyone *need* to do so? UTF-16 can represent all codepoints upto Plane 16 right? -- Shriramana Sharma

Re: Code pages and Unicode

2011-08-22 Thread Andrew West
On 22 August 2011 12:51, Shriramana Sharma samj...@gmail.com wrote: On 08/22/2011 03:05 PM, Andrew West wrote: Can anyone think of a way to extend UTF-16 without adding new surrogates or inventing a new general category? Why would anyone *need* to do so? UTF-16 can represent all codepoints

RE: Code pages and Unicode

2011-08-22 Thread Doug Ewell
srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote: The true lifting of UTF-16 would be to UTF-32. Leave the UTF-16 un touched and make the new half versatile as possible. I think any other solution is just a patch up for the timebeing. There is no evidence whatsoever that this

Re: Code pages and Unicode

2011-08-22 Thread John H. Jenkins
Christoph Päper 於 2011年8月20日 上午2:31 寫道: Mark Davis ☕: Under the original design principles of Unicode, the goal was a bit more limited; we envisioned […] a generative mechanism for infrequent CJK ideographs, I'd still like having that as an option. Et voilà! We have Ideographic

Re: Code pages and Unicode

2011-08-22 Thread William_J_G Overington
On Monday 22 August 2011, Andrew West andrewcw...@gmail.com wrote: Can anyone think of a way to extend UTF-16 without adding new surrogates or inventing a new general category? Andrew How about a triple sequence of two high surrogates followed by one low surrogate? I suggest this as a

Re: Code pages and Unicode

2011-08-22 Thread Jean-François Colson
On 22/08/11 16:55, Doug Ewell wrote: srivas sinnathuraisisrivas at blueyonder dot co dot uk wrote: The true lifting of UTF-16 would be to UTF-32. Leave the UTF-16 un touched and make the new half versatile as possible. I think any other solution is just a patch up for the timebeing. There

Re: Code pages and Unicode

2011-08-22 Thread Jean-François Colson
On 20/08/11 02:03, Ken Whistler wrote: O.k., so apparently we have awhile to go before we have to start worrying about the Y2K or IPv4 problem for Unicode. Call me again in the year 2851, and we'll still have 5 years left to design a new scheme and plan for the transition. ;-) --Ken I

Re: Code pages and Unicode

2011-08-22 Thread Ken Whistler
On 8/22/2011 9:58 AM, Jean-François Colson wrote: I wonder whether you aren’t a little too optimistic. No. If anything I'm assuming that the folks working on proposals will be amazingly assiduous during the next decade. Have you considered the unencoded ideographic scripts? Why, yes I

Re: Code pages and Unicode

2011-08-22 Thread Richard Wordingham
On Mon, 22 Aug 2011 14:06:00 +0100 (BST) William_J_G Overington wjgo_10...@btinternet.com wrote: On Monday 22 August 2011, Andrew West andrewcw...@gmail.com wrote: Can anyone think of a way to extend UTF-16 without adding new surrogates or inventing a new general category? Andrew

Re: Code pages and Unicode

2011-08-22 Thread Ken Whistler
On 8/22/2011 3:15 PM, Richard Wordingham wrote: On Monday 22 August 2011, Andrew Westandrewcw...@gmail.com wrote: Can anyone think of a way to extend UTF-16 without adding new surrogates or inventing a new general category? Andrew How about a triple sequence of two

Re: Code pages and Unicode

2011-08-22 Thread Jean-François Colson
On 23/08/11 00:15, Richard Wordingham wrote: The problem is that a search for the character represented by the code unit sequence (H2,L3) would also pick up the sequence (H1,H2,L3). While there is no ambiguity, it does make searching more complicated to code. The same issue applies to the

Re: Code pages and Unicode

2011-08-20 Thread srivas sinnathurai
About the research works. I alone (with with my colleagues) researching the fact that Sumerian is Tamil / Tamil is Sumerian This requires quite a lot of space. Additionally I do research on Tamil alphabet as based on scientific definitions and it only represents the mechanical parts , ie only

Re: Code pages and Unicode

2011-08-20 Thread Doug Ewell
for a character encoding. -- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­ From: srivas sinnathurai Sent: Saturday, August 20, 2011 3:35 To: Christoph Päper Cc: unicode@unicode.org Subject: Re: Code pages and Unicode

Re: Code pages and Unicode

2011-08-20 Thread Richard Wordingham
On Fri, 19 Aug 2011 17:03:41 -0700 Ken Whistler k...@sybase.com wrote: O.k., so apparently we have awhile to go before we have to start worrying about the Y2K or IPv4 problem for Unicode. Call me again in the year 2851, and we'll still have 5 years left to design a new scheme and plan for the

Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Doug Ewell
srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote: PUA is not structured It's not supposed to be. It's a private-use area. You use it the way you see fit. and not officially programmable to accommodate numerous code pages. None of Unicode is designed around code-page switching

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread srivas sinnathurai
sisrivas at blueyonder dot co dot uk wrote: PUA is not structured It's not supposed to be. It's a private-use area. You use it the way you see fit. and not officially programmable to accommodate numerous code pages. None of Unicode is designed around code-page switching. It's a flat code

RE: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Doug Ewell
srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote: Why this suggestion? With current flat space, one code point is only allocated to one and only one purpose. We can run out of code space soon. Argument over. There are not 800,000 more characters that need to be encoded for

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread John H. Jenkins
srivas sinnathurai 於 2011年8月19日 上午9:40 寫道: Why this suggestion? With current flat space, one code point is only allocated to one and only one purpose. We can run out of code space soon. There are a couple of problems here. We currently have over 860,000 unassigned code points. Surveys

Re: Code pages and Unicode

2011-08-19 Thread Christoph Päper
John H. Jenkins: there would have to be a *lot* of writing systems out there we don't know about to fill up planes 4 through 14 That’s quite possible, though, the universe is huge. The question rather is whether we will ever know about them. It’s quite possible we won’t.

RE: Code pages and Unicode

2011-08-19 Thread Doug Ewell
Maybe we should step back a bit: I'm not calling for any change to existing major aloocations. However, this is about time we allocate (not PUA) large number of codes to a code page based sub codes so that not only all 7000+ languages can Freely use it without INTERFERENCE from Unicode and

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Michael Everson
On 19 Aug 2011, at 18:24, John H. Jenkins wrote: We currently have over 860,000 unassigned code points. Surveys of all known writing systems indicate that only a small fraction of these will be needed. Indeed, although it looks likely that Han will spill out of the SIP into plane 3, all

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Mark E. Shoulson
On 08/19/2011 01:24 PM, John H. Jenkins wrote: In order to get the UTC and WG2 to agree to a major architectural change such as you're suggesting, you'd have to have some very solid evidence that it's needed—not an interesting idea, not potentially useful, but seriously *needed*. That's how

RE: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Doug Ewell
Mark E. Shoulson mark at kli dot org wrote: And indeed, it went the other way too, back when ISO-10646 had not 17, but 65536 *planes* and someone provided some reasonable evidence (or just plain reasoned arguments) that 4.3 *billion* characters was probably overkill. Technically, I think

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Jukka K. Korpela
20.8.2011 0:07, Doug Ewell wrote: Of course, 2.1 billion characters is also overkill, but the advent of UTF-16 was how we ended up with 17 planes. And now we think that a little over a million is enough for everyone, just as they thought in the late 1980s that 16 bits is enough for everyone.

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Mark E. Shoulson
On 08/19/2011 05:07 PM, Doug Ewell wrote: Mark E. Shoulsonmark at kli dot org wrote: And indeed, it went the other way too, back when ISO-10646 had not 17, but 65536 *planes* and someone provided some reasonable evidence (or just plain reasoned arguments) that 4.3 *billion* characters was

Re: Code pages and Unicode

2011-08-19 Thread Benjamin M Scarborough
On 20 Aug 2011, at 00:35, Jukka K. Korpela wrote: And now we think that a little over a million is enough for everyone, just as they thought in the late 1980s that 16 bits is enough for everyone. Whenever somebody talks about needing 31 bits for Unicode, I always think of the hypothetical

RE: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Doug Ewell
Jukka K. Korpela jkorpela at cs dot tut dot fi wrote: And now we think that a little over a million is enough for everyone, just as they thought in the late 1980s that 16 bits is enough for everyone. I know this is an enjoyable exercise — people love to ridicule Bill Gates for his comment in

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Ken Whistler
On 8/19/2011 2:07 PM, Doug Ewell wrote: Technically, I think 10646 was always limited to 32,768 planes so that one could always address a code point with a 32-bit signed integer (a nod to the Java fans). Well, yes, but it didn't really have anything to do with Java. Remember that Java wasn't

Re: Code pages and Unicode

2011-08-19 Thread John H. Jenkins
Benjamin M Scarborough 於 2011年8月19日 下午3:53 寫道: Whenever somebody talks about needing 31 bits for Unicode, I always think of the hypothetical situation of discovering some extraterrestrial civilization and trying to add all of their writing systems to Unicode. I imagine there would be

Re: Code pages and Unicode

2011-08-19 Thread Ken Whistler
On 8/19/2011 2:53 PM, Benjamin M Scarborough wrote: Whenever somebody talks about needing 31 bits for Unicode, I always think of the hypothetical situation of discovering some extraterrestrial civilization and trying to add all of their writing systems to Unicode. I imagine there would be

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Asmus Freytag
On 8/19/2011 2:35 PM, Jukka K. Korpela wrote: 20.8.2011 0:07, Doug Ewell wrote: Of course, 2.1 billion characters is also overkill, but the advent of UTF-16 was how we ended up with 17 planes. And now we think that a little over a million is enough for everyone, just as they thought in the

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Asmus Freytag
On 8/19/2011 3:24 PM, Ken Whistler wrote: On 8/19/2011 2:07 PM, Doug Ewell wrote: Technically, I think 10646 was always limited to 32,768 planes so that one could always address a code point with a 32-bit signed integer (a nod to the Java fans). Well, yes, but it didn't really have anything