Christian M. Cepel wrote:

> It was my understanding that all unicode character sets contain English
> characters mapped to the same values they're mapped to in other sets.

Close -- Unicode is a *single* character set.  For convenience, you'll
frequently run into references to Unicode code pages, but all they are is a
range within the overall character set.  All characters from every encoding
that Unicode supports exist somewhere in that character set.

So with a Unicode (UTF-8 or UTF-16) encoded text file you could easily have
English, Chinese, Korean, Russian, and Symbol characters all in the same
sentence.

Another convenient item is that the first Unicode code page 0x0001 - 0x007f
is the ASCII code.  So if you're using wchar instead of char as your string
pointer type, then comparisons like:

    if (str[0] == 'K')

...will work the same when using Unicode or ASCII.  The only difference is
now str points to an array of 16 bit values instead of 8 bit ones.

-->Steve Bennett

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to