In <[email protected]>, on 10/04/2013
at 10:31 AM, Charles Mills <[email protected]> said:
>It is easy to write UCS-2 or UTF-16 or whatever it is called
UCS-2 is called UCS-2 and UTF-16 is called UTF-16; they are not the
same.
>All you are doing with UCS-2 is making it harder to test the outlier
>conditions.
No. Anything that is an outlier for UCS-2 is also an outlier for
UTF-8, and there is less complexity for UCS-2, since it only covers
the BMP and has a consistent size for each code point. The only real
issues with UCS-2 aare the BOM and the fact that it takes more space
for mostly ASCII text.
>The article also points out that "what is a character?" is not a simple
>question,
It is in Unicode; a character is not the same as a glyph.
>Ditto for combining characters. é (hope that makes it through the listserver)
>may I
>believe be legitimately encoded as two "computer" characters, but
>everyone considers it culturally to be a single character.
U+0065 U+00B4 is two characters, even though you would normally render
it with the same glyph as U+00E9. See "The Unicode 5.0 Standard".
--
Shmuel (Seymour J.) Metz, SysProg and JOAT
ISO position; see <http://patriot.net/~shmuel/resume/brief.html>
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN