On Mon, 2004-02-09 at 02:22, gabor wrote: <snip/> > i just can't understand why the designers of dotnet didn't look at the unicode > standards. i can understand that java has this problem, but java is much older > than dotnet. > > maybe it's because winapi uses 16-bit characters?
I imagine it's due to a memory trade-off. The easiest way for the programmer do deal with things would be to just use UCS-32 for all Unicode strings. You wouldn't have to worry about code pairs or anything else like that. It would also mean that all strings would require 32-bits for each character, which would eat up *lots* of memory for all strings. The most common code points -- US, Europe, Asia -- all easily fit within 16-bits, *by design*. So the designers had a choice: use 32-bit characters internally everywhere, forcing nearly all users to "waste" 16-24 bits/character, or 1/2 - 3/4 of all memory dedicated to strings, or use 16-bit characters internally, which would suite the needs of most current users (probably > 80%), while only "wasting" 8-bits/character for the US and parts of Europe, a minority of the world population. 16-bit characters were considered to be a decent trade-off, I would assume. An alternative approach could have been for the string to do on-the-fly conversion between Unicode UCS-32 code-points and an internal representation, such as UTF-16. This would imply that System.Char is a 32-bit structure, and that System.String wouldn't conceptually store a char[] array, but rather some implementation-defined encoding of the char[] array, to save memory. This could be argued to complicate things, but I don't know why else this strategy wouldn't work. - Jon _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
