UTF-16 is much faster in many situations than UTF-8.
It really depends a lot on just what you are doing, and the data you are 
processing.
If it is mainly in North/South America, Western Europe, or Australia/NZ, 
UTF-8 does OK.
UTF-8 is great for data interchange, but can really slow things down if you 
have many non-ASCII characters
(as well as bloat the size of any buffers you need - because you'll need to 
allocate 50% more space than for UTF-16, to be sure you can hold the same # 
of characters).

UTF-16 is used by Windows APIs, but also ICU, Java, C++ UnicodeString. 
Python 3 actually picks a 1,2,4 byte representation depending on what 
characters are in the string (so UTF-16, but with no surrogate pairs, when 
there are any characters > 0xff, but none > 0xffff).

Scott

Reply via email to