Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

Rich Felker Tue, 24 Apr 2007 14:23:19 -0700

On Tue, Apr 24, 2007 at 04:43:59PM -0400, ＳｒｉｎＴｕａｒ wrote:
> Basically, its a proposal to cap at 10FFFF.
> 
> I see no reason to cap utf-8 and utf-32 just to deal with the
> limitations of utf-16.
> 
> As long as you don't attempt to convert to utf-16, it should not be a
> problem. (and eventually, utf-16 should be phased out)


Capping is a good thing, and 21-bit is exactly the point you want to
cap at. Not only does it ensure that required table indices for UCS
support can't grow unmanagably large; it also ensures that UTF-8 is no
larger than UTF-32, so that conversion can be done in-place in
situations where storage space is limited.

Almost all present-day scripts have already been encoded, and plenty
of historical ones too. Even 18 or 19 bits would have been a plenty. I
see no legitimate practical argument against a 21-bit limit; it just
increases the potential for implementation complexity with no
benefits.

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

Reply via email to