On Tue, Jan 24, 2012 at 5:14 PM, Allen Wirfs-Brock <[email protected]> wrote: > The current 16-bit character strings are sometimes uses to store non-Unicode > binary data and can be used with non-Unicode character encoding with up to > 16-bit chars. 21 bits is sufficient for Unicode but perhaps is not enough > for other useful encodings. 32-bit seems like a plausable unit.
People only use strings to store binary data because they didn't have native binary data types. Now they do. Continuing to optimize strings for this use-case seems unnecessary. > The real controversy that developed over this proposal regarded whether or > not every individual Unicode characters needs to be uniformly representable > as a single element of a String. This proposal took the position that they > should. Other voices felt that such uniformity was unnecessary and seem > content to expose UTF-8 or UTF-16. The argument was that applications may > have to look at multiple character logical units anyway, so dealing with UTF > encodings isn't much of an added burden. Anyone who argues that authors should have to deal with multibyte characters spread across >1 elements in a string has never tried to deal with having a non-BMP name on the web. UTF-16 is particularly horrible in this regard, as "most" names authors will see (if they're not serving a CJK audience explicitly) are in the BMP and thus are a single element. UTF-8 at least has the "advantage" that authors are somewhat more likely to encounter problems if they assume 1 character = 1 element. Making strings more complicated is, unfortunately, user-hostile against people with names outside of ASCII or the BMP. ~TJ _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

