On Thu, 01 Jun 2017 12:54:45 -0700 Doug Ewell via Unicode <unicode@unicode.org> wrote:
> Richard Wordingham wrote: > > > even supporting 6-byte patterns just in case 20.1 bits eventually > > turn out not to be enough, > > Oh, gosh, here we go with this. You were implicitly invited to argue that there was no need to handle 5 and 6 byte invalid sequences. > What will we do if 31 bits turn out not to be enough? A compatible extension of UTF-16 to unbounded length has already been designed. Prefix bytes 0xFF can be used to extend the length for UTF-8 by 8 bytes at a time. Extending UTF-32 is not beyond the wit of man, and we know that UTF-16 could have been done better if the need had been foreseen. While it seems natural to hold a Unicode scalar value in a single machine word of some length, this is not necessary, just highly convenient. In short, it won't be a big problem intrinsically. The UCD may get a bit unwieldy, which may be a problem for small systems without Internet access. Richard.