Yer right. It's a single character set (all the characters in the world! -- well, not quite: Jurchen, NĂ¼ Shu, Tangut, and Linear A are "working their way through the approval process;" Klingon is ineligible because of "lack of real world use") and a variety of ways of encoding them. Okay?
It's not "a format," right? Also, a fairly obvious typo in what I wrote: "treat bytes 1, 3, 5, ... as ASCII." Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Paul Gilmartin Sent: Monday, September 23, 2013 5:18 PM To: [email protected] Subject: Re: UNICODE to EBCDIC On Mon, 23 Sep 2013 16:56:46 -0700, Charles Mills wrote: > >"Unicode" is not a character set (or "format") -- it's a whole family of >character sets. http://en.wikipedia.org/wiki/Unicode. If it's UTF-8 then you >can do a 98% job if you just treat it as ASCII. If it's UTF-16 or UCS-2 you >can do a 98% job if you just discard bytes 0, 2, 4, ... and treat bytes 1, 2, >5, ... as ASCII. > A little misleading, as I see it. There's only one set of code points, but, yes, multiple encoding methods (op. cit.). This is similar to saying that there are two (or more) USASCII character sets because they're represented big-endian in storage but little-endian in network transmission. >There is actually a "Unicode EBCDIC" (UTF-EBCDIC) but it's pretty obscure. > Not as obscure as it deserves to be. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
