On 11/18/2009 09:29 PM, Kenneth Marshall wrote:
> I thought that UTF8, UTF-16 and UTF-32 can represent all the characters.
> In that case, why wouldn't you use the UTF8 equivalent? At the least it
> would save space.

They can but it's about *how* they do it.

UTF32 just represents every character you can think of as a number, 32 
bits. Easy, but it's 4x bigger for standard western text.

UTF8 is really UTF32, with compression, to counter the expansion effect. 
Instead of using 32 bits for every character it will compress these 32 
bits into 8 bits most of the time. Of course you can't do this for each 
character, as then 24 out of the 32 bits would have been unused :) So 
for some characters UTF8 can't do this, and it'll put 16 bits (you 
notice this when you open a text file, which is UTF8 encoded but your 
text editor doesn't know this and you see 2 characters instead of 1). 
And for even other characters UTF8 also uses 32 bits... So UTF8 really 
is UTF32 with only a small size penalty.

If you store text on disk, then of course, never use UTF32, it's 
inefficient. But from a programming point of view, it's really easy to 
work with. Everything is a number of fixed length.

If you use UTF8 you have to 'decompress' to know the kind of character. 
Is it 8bits, 16 or 32 bits wide? Etc, your code has to 'know' the 
different cases. Not fun to work with :)

So that's why it happens a lot that you store your files in UTF8, but 
you read them in and convert to UTF32, then process, then convert back 
to UTF8 and write back to disk.

Alexander

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to