On Mon, 5 Jun 2017 13:08:06 +0900 "Martin J. Dürst via Unicode" <unicode@unicode.org> wrote:
> On 2017/06/02 04:54, Doug Ewell via Unicode wrote: > > Richard Wordingham wrote: > > > >> even supporting 6-byte patterns just in case 20.1 bits eventually > >> turn out not to be enough, > > Sorry to be late with this, but if 20.1 bits turn out to not be > enough, what about 21 bits? > > That would still limit UTF-8 to four bytes, but would almost double > the code space. Assuming (conservatively) that it will take about a > century to fill up all 17 (well, actually 15, because two are > private) planes, this would give us another century. It all depends on how the lead byte is parsed. With a block-if construct ignorant of the original design or a look-up table, it may be simplest to treat F5 onwards as out and out errors and not expect any trailing bytes. Code handling attempts at 6-byte code points was the most complex case. Of course, one **might** want to handle a list of mostly small positive integers, at which point the old UTF-8 design might be useful. Richard.