Re: Subject Unicode

Charles Mills Thu, 09 Jan 2014 17:47:04 -0800

There is no such thing as "French Unicode." That is the "uni" part and the 
beauty of Unicode.


There are several flavors of Unicode, but they relate to how the code points 
are stored in a file or transmitted, not to the character set. All of Unicode 
is something like a million possible characters (someone will no doubt correct 
me with the exact number in use). Plain old ABC, "French" letters like ô, 
symbols like €, it's all there in one big Unicode. Every letter is always the 
same, whether you are in America or in France.

Now, how do you represent that in a file or whatever? Well, you could use 32 
bits for every character. Not very efficient, but certainly straightforward. 
That is called UTF-32. It's not very common.

You could use 16 bits for every character, with some sort of cleverness that 
yielded two 16-bit words when you had a code point bigger than 65535 (actually 
somewhat less due to how the cleverness works). That is called UTF-16. Pretty 
good but still not very efficient.

You could use 8 bits for most characters, with cleverness that expanded that 
out to two or three bytes for more obscure characters. Pretty efficient, and 
you could make the first part of the character set the same as ASCII, which 
would make it intuitive for PC folks who "know" that A is X'41'. That is called 
UTF-8, and it's pretty good and pretty popular as a result. Most Web pages are 
in UTF-8 and I believe this e-mail came to you in UTF-8.

Okay?

Now, define "keep it intact." Do you mean bit for bit intact, or do you mean 
"so that when I open it up in ISPF, what looked like an A on the PC now looks 
like an A in ISPF"? If the former, you want a binary transfer, end of story. If 
the latter, you don't really want to keep it intact, you want to translate 
Unicode -- and you will need to know which flavor of Unicode encoding (not what 
country) -- to EBCDIC, which is what ISPF and most COBOL programs expect.

Comprende?

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Scott Ford
Sent: Thursday, January 09, 2014 4:36 PM
To: [email protected]
Subject: Subject Unicode

All:
 
I have a fundamental question on Unicode, or more of how it works . I am 
confused about the following scenario.. PC ( data using a foreign language 
Unicode page, like French )  going to z/OS and being keep in tact. Names and 
address type data. As the application do I have to query the incoming data and 
find out what the Unicode CECP is then translate to the desired ? or how does 
it work ?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Subject Unicode

Reply via email to