On Mon, 23 Sep 2013 16:56:46 -0700, Charles Mills wrote:
>
>"Unicode" is not a character set (or "format") -- it's a whole family of 
>character sets. http://en.wikipedia.org/wiki/Unicode. If it's UTF-8 then you 
>can do a 98% job if you just treat it as ASCII. If it's UTF-16 or UCS-2 you 
>can do a 98% job if you just discard bytes 0, 2, 4, ... and treat bytes 1, 2, 
>5, ... as ASCII.
>
A little misleading, as I see it.  There's only one set of code points, but, 
yes,
multiple encoding methods (op. cit.).  This is similar to saying that there are
two (or more) USASCII character sets because they're represented big-endian
in storage but little-endian in network transmission.

>There is actually a "Unicode EBCDIC" (UTF-EBCDIC) but it's pretty obscure.
>
Not as obscure as it deserves to be.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to