On 11/18/2009 09:29 PM, Kenneth Marshall wrote: > I thought that UTF8, UTF-16 and UTF-32 can represent all the characters. > In that case, why wouldn't you use the UTF8 equivalent? At the least it > would save space.
They can but it's about *how* they do it. UTF32 just represents every character you can think of as a number, 32 bits. Easy, but it's 4x bigger for standard western text. UTF8 is really UTF32, with compression, to counter the expansion effect. Instead of using 32 bits for every character it will compress these 32 bits into 8 bits most of the time. Of course you can't do this for each character, as then 24 out of the 32 bits would have been unused :) So for some characters UTF8 can't do this, and it'll put 16 bits (you notice this when you open a text file, which is UTF8 encoded but your text editor doesn't know this and you see 2 characters instead of 1). And for even other characters UTF8 also uses 32 bits... So UTF8 really is UTF32 with only a small size penalty. If you store text on disk, then of course, never use UTF32, it's inefficient. But from a programming point of view, it's really easy to work with. Everything is a number of fixed length. If you use UTF8 you have to 'decompress' to know the kind of character. Is it 8bits, 16 or 32 bits wide? Etc, your code has to 'know' the different cases. Not fun to work with :) So that's why it happens a lot that you store your files in UTF8, but you read them in and convert to UTF32, then process, then convert back to UTF8 and write back to disk. Alexander ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel