DM Smith wrote:
Daniel Naber wrote:
But wouldn't UTF-16 mean 2 bytes per character? That doesn't seem to be the case.

UTF-16 is a fixed 2 byte/char representation.

Except when it's not.  I.e., above the BMP.

From the Unicode 4.0 standard <http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>:

   In the UTF-16 encoding form, code points in the
   range U+0000..U+FFFF are represented as a single
   16-bit code unit; code points in the supplementary
   planes, in the range U+10000..U+10FFFF, are
   instead represented as pairs of 16-bit code units.
   These pairs of special code units are known as
   surrogate pairs.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to