Re: Unicode

Owen Taylor Fri, 21 Sep 2001 15:56:22 -0700

> Furthermore, the Unicode character set has tens of thousands of empty
> spaces. 

Try, hundreds of thousands of empty spaces. The first 65536 code
points are pretty well filled up, but the full Unicode character space
includes somewhat over a million code points; and that's not going to
fill up any time soon.

A lot of new Han characters were added in Unicode 3.1, and, iirc,
believe that more are scheduled for 3.2. It's pretty hard for someone
to complain at this point that Unicode is missing necessary
characters.


Yes, people kick and scream about their beloved legacy encodings, but
Unicode _is_ good enough as a character set, and there is no reason to
use anything else as the character set internally. Provisions of
course, have to be made for conversion between internal encodings and
different external encodings, but this is a matter for the IO
subsystem and for utility routines, not for a language core.

While having a single character set does _not_ give you full
internationalization, having a single character set makes
internationalization _much_ easier.

Remember, in many contexts:

 - Network protocols
 - File systems
 - A web server serving multiple clients

"The encoding of the current locale" is a meaningless concept, so the
only real alternative to using a standardized character set and
encoding, is to use an encoding that tags character sets (iso-2022) or
to tag every string with an encoding. Expecting people to get this
right is _far_ too high of a burden and basically means that programs
will be broken for internationalization.

Regards,
                                        Owen
Re: Unicode

Reply via email to