G'day all. On Fri, Oct 05, 2001 at 02:29:51AM -0700, Krasimir Angelov wrote:
> Why Char is 32 bit. UniCode characters is 16 bit. It's not quite as simple as that. There is a set of one million (more correctly, 1M) Unicode characters which are only accessible using surrogate pairs (i.e. two UTF-16 codes). There are currently none of these codes assigned, and when they are, they'll be extremely rare. So rare, in fact, that the cost of strings taking up twice the space that the currently do simply isn't worth the cost. However, you still need to be able to handle them. I don't know what the "official" Haskell reasoning is (it may have more to do with word size than Unicode semantics), but it makes sense to me to store single characters in UTF-32 but strings in a more compressed format (UTF-8 or UTF-16). See also: http://www.unicode.org/unicode/faq/utf_bom.html It just goes to show that strings are not merely arrays of characters like some languages would have you believe. Cheers, Andrew Bromage _______________________________________________ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users