On Tue, Apr 24, 2007 at 04:43:59PM -0400, SrinTuar wrote: > Basically, its a proposal to cap at 10FFFF. > > I see no reason to cap utf-8 and utf-32 just to deal with the > limitations of utf-16. > > As long as you don't attempt to convert to utf-16, it should not be a > problem. (and eventually, utf-16 should be phased out)
Capping is a good thing, and 21-bit is exactly the point you want to cap at. Not only does it ensure that required table indices for UCS support can't grow unmanagably large; it also ensures that UTF-8 is no larger than UTF-32, so that conversion can be done in-place in situations where storage space is limited. Almost all present-day scripts have already been encoded, and plenty of historical ones too. Even 18 or 19 bits would have been a plenty. I see no legitimate practical argument against a 21-bit limit; it just increases the potential for implementation complexity with no benefits. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
