Re: Subject Unicode

Charles Mills Thu, 09 Jan 2014 21:32:18 -0800

Yup.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of Robert A. Rosenberg
Sent: Thursday, January 09, 2014 8:19 PM
To: [email protected]
Subject: Re: Subject Unicode

At 17:45 -0800 on 01/09/2014, Charles Mills wrote about Re: Subject
Unicode:

>You could use 8 bits for most characters, with cleverness that expanded 
>that out to two or three bytes for more obscure characters.
>Pretty efficient, and you could make the first part of the character 
>set the same as ASCII, which would make it intuitive for PC folks who 
>"know" that A is X'41'. That is called UTF-8, and it's pretty good and 
>pretty popular as a result. Most Web pages are in UTF-8 and I believe 
>this e-mail came to you in UTF-8.

Note that that "ASCII" is "US-ASCII" and is codepoints x00 to x7f. 
UTF-8 maps US-ASCII to its single byte codepoint. Any codepoint from
x80 to xff (from ISO-8859-1 or Windows-1252 [which is ISO-8859-1 from
xa0 to xff with the useless ISO-8859-1 x80 to x9F codepoints replaced with
32 extra useful glyphs such as curved quotes and the euro symbol] which the
normal mapping used for email and accented
characters/etc) is mapped as 2 bytes (the high half of each byte is a
x8 to xf nibble).

For more info (and the gruesome details <g>), look at
https://en.wikipedia.org/wiki/UTF8.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email
to [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Subject Unicode

Reply via email to