DM Smith wrote:
Daniel Naber wrote:
But wouldn't UTF-16 mean 2 bytes per character? That doesn't seem to
be the case.
UTF-16 is a fixed 2 byte/char representation.
Except when it's not. I.e., above the BMP.
From the Unicode 4.0 standard
<http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>:
In the UTF-16 encoding form, code points in the
range U+0000..U+FFFF are represented as a single
16-bit code unit; code points in the supplementary
planes, in the range U+10000..U+10FFFF, are
instead represented as pairs of 16-bit code units.
These pairs of special code units are known as
surrogate pairs.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]