Yer right. It's a single character set (all the characters in the world! -- 
well, not quite: Jurchen, NĂ¼ Shu, Tangut, and Linear A are "working their way 
through the approval process;" Klingon is ineligible because of "lack of real 
world use") and a variety of ways of encoding them. Okay?

It's not "a format," right?

Also, a fairly obvious typo in what I wrote: "treat bytes 1, 3, 5, ... as 
ASCII."

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Paul Gilmartin
Sent: Monday, September 23, 2013 5:18 PM
To: [email protected]
Subject: Re: UNICODE to EBCDIC

On Mon, 23 Sep 2013 16:56:46 -0700, Charles Mills wrote:
>
>"Unicode" is not a character set (or "format") -- it's a whole family of 
>character sets. http://en.wikipedia.org/wiki/Unicode. If it's UTF-8 then you 
>can do a 98% job if you just treat it as ASCII. If it's UTF-16 or UCS-2 you 
>can do a 98% job if you just discard bytes 0, 2, 4, ... and treat bytes 1, 2, 
>5, ... as ASCII.
>
A little misleading, as I see it.  There's only one set of code points, but, 
yes, multiple encoding methods (op. cit.).  This is similar to saying that 
there are two (or more) USASCII character sets because they're represented 
big-endian in storage but little-endian in network transmission.

>There is actually a "Unicode EBCDIC" (UTF-EBCDIC) but it's pretty obscure.
>
Not as obscure as it deserves to be.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to