RE: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread Doug Ewell via Unicode
Martin J. Dürst wrote: > Assuming (conservatively) that it will take about a century to fill up > all 17 (well, actually 15, because two are private) planes, this would > give us another century. Current estimates seem to indicate that 800 years is closer to the mark. -- Doug Ewell | Thornton,

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread William_J_G Overington via Unicode
Martin J. Dürst > Sorry to be late with this, but if 20.1 bits turn out to not be enough, what about 21 bits? Martin J. Dürst > That would still limit UTF-8 to four bytes, but would almost double the code space. Assuming (conservatively) that it will take about a century to fill up all 17

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread Richard Wordingham via Unicode
On Mon, 5 Jun 2017 13:08:06 +0900 "Martin J. Dürst via Unicode" wrote: > On 2017/06/02 04:54, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > >> even supporting 6-byte patterns just in case 20.1 bits eventually > >> turn out not to be enough, > >

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-04 Thread David Starner via Unicode
On Sun, Jun 4, 2017 at 9:13 PM Martin J. Dürst via Unicode < unicode@unicode.org> wrote: > Sorry to be late with this, but if 20.1 bits turn out to not be enough, > what about 21 bits? > > That would still limit UTF-8 to four bytes, but would almost double the > code space. Assuming

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-04 Thread Martin J. Dürst via Unicode
On 2017/06/02 04:54, Doug Ewell via Unicode wrote: Richard Wordingham wrote: even supporting 6-byte patterns just in case 20.1 bits eventually turn out not to be enough, Sorry to be late with this, but if 20.1 bits turn out to not be enough, what about 21 bits? That would still limit

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 8:32 PM, Richard Wordingham via Unicode wrote: TUS Section 3 is like the Augean Stables. It is a complete mess as a standards document, That is a matter of editorial taste, I suppose. imputing mental states to computing processes. That, however, is false. The rhetorical turn

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 19:19:51 -0700 Ken Whistler via Unicode wrote: > > and therefore should start a > > sequence of 6 characters. > > That is completely false, and has nothing to do with the current > definition of UTF-8. > > The current, normative definition of UTF-8,

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 6:21 PM, Richard Wordingham via Unicode wrote: By definition D39b, either sequence of bytes, if encountered by an conformant UTF-8 conversion process, would be interpreted as a sequence of 6 maximal subparts of an ill-formed subsequence. ("D39b" is a typo for "D93b".) Sorry about

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > Well, working from the *current* specification: > > FC 80 80 80 80 80 > and > FF FF FF FF FF FF > > are equal trash, uninterpretable as *anything* in UTF-8. > > By definition D39b, either sequence of

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode wrote: > On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: > > You were implicitly invited to argue that there was no need to > > handle 5 and 6 byte invalid sequences. > > > > Well, working from the

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Ken Whistler via Unicode
On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote: You were implicitly invited to argue that there was no need to handle 5 and 6 byte invalid sequences. Well, working from the *current* specification: FC 80 80 80 80 80 and FF FF FF FF FF FF are equal trash, uninterpretable as

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Philippe Verdy via Unicode
This is still very unlikely to occur. Lot of discussions about emojis but they still don't count a lot in the total. The major updates were epected for CJK sinograms, but even the rate of updates has slowed down and we will eventually will have another sinographic plane, but it will not come soon

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Richard Wordingham via Unicode
On Thu, 01 Jun 2017 12:54:45 -0700 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > > even supporting 6-byte patterns just in case 20.1 bits eventually > > turn out not to be enough, > > Oh, gosh, here we go with this. You were implicitly invited to argue