Yup. Charles
-----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Robert A. Rosenberg Sent: Thursday, January 09, 2014 8:19 PM To: [email protected] Subject: Re: Subject Unicode At 17:45 -0800 on 01/09/2014, Charles Mills wrote about Re: Subject Unicode: >You could use 8 bits for most characters, with cleverness that expanded >that out to two or three bytes for more obscure characters. >Pretty efficient, and you could make the first part of the character >set the same as ASCII, which would make it intuitive for PC folks who >"know" that A is X'41'. That is called UTF-8, and it's pretty good and >pretty popular as a result. Most Web pages are in UTF-8 and I believe >this e-mail came to you in UTF-8. Note that that "ASCII" is "US-ASCII" and is codepoints x00 to x7f. UTF-8 maps US-ASCII to its single byte codepoint. Any codepoint from x80 to xff (from ISO-8859-1 or Windows-1252 [which is ISO-8859-1 from xa0 to xff with the useless ISO-8859-1 x80 to x9F codepoints replaced with 32 extra useful glyphs such as curved quotes and the euro symbol] which the normal mapping used for email and accented characters/etc) is mapped as 2 bytes (the high half of each byte is a x8 to xf nibble). For more info (and the gruesome details <g>), look at https://en.wikipedia.org/wiki/UTF8. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
