Oh goody. A character sets question.
On 06/12/16 18:20, Scott Ford wrote:
I found the problem we are using an IBM TBL EZACICTR which doesnt support
CP 437, duh ....
Bummer.
Today the role of Lynn Wheeler will be played by /moi/ as I give some
interesting (to me) history related to this topic. (Salted with
entertaining embellishment because my facts just aren't as detailed or
interesting as his.)
I have a bigger question, if we wanted to support Unicode (yeah ugh), how
do I know what CCSIDS to support ?
The problem with Unicode is that it's not an 8-bit codepage. It's 32
bits. It is the solution to all of our planet-wide problems because
"there's room for everyone!".
I like UTF-8 where you get an 8-bit wide byte stream. (And there are no
worries over endianness.) But that doesn't suite everyone. 8-bit bytes
don't even work for program source code anymore! Eight bit bummer.
For example we go from EBCDIC on z/OS to ASCII and from ASCII to EBCDIC.
Do I some how have to tell the target what the sending CCSID is ?
Yes. (But I more often see the codepage numbers than some CCSID.)
Without better knowledge of your data and the environment, I can only
recommend circling near "EBCDIC is CP1047" and "ASCII is ISO-8859-1". If
your stuff is US and most of Western Europe, that works. (Not so helpful
for the Russians or the Greeks or anyone East of them.)
The Story
We've enjoyed this hemmorrhiod for decades.
Dirty little secret: IBM was one of the backers of ASCII in the 1960s.
The S/360 had an ASCII/EBCDIC switch. But too much momentum with
Hollerith history. OS/360 and its siblings continued using EBCDIC. So
the nifty A/E HW bit got re-purposed. Besides, we can fix everything in
software, right? Ahh, those were the days. If only 16M were enough.
Twenty-four bit addressing mode bummer.
In the late 1980s, Edwin Hart, then at Johns Hopkins Applied Physics and
active with SHARE, spear-headed a customer effort to _distill common
practice_ into consistency. The result was
*SHARE Report SSD No. 366*:
ASCII and EBCDIC Character Set and Code Issues in Systems Application
Architecture,
The ASCII/EBCDIC Character Set Task Force.
Edited by Edwin Hart,
The Johns Hopkins University,
Applied Physics Laboratory,
Laurel, Maryland, USA;
published by Share Inc.,
111 East Wacker Drive, Chicago, Illinois, USA 60601;
*June 1989*
The effect was what some called "Codepage 37 version 2". Most mainframe
sites were using either CP 37 or CP 500 (or subsets), neither of which
mapped correctly to de-facto EBCDIC (for common translations to/from
ASCII). CP 37 was the closer of the two. With minor code point
re-assignment, a codepage floated to the surface which many of us
rabidly skimmed off and ran with.
IBM took the SHARE report to heart. Mostly. They soon blessed us with CP
1047, the standard on USS, even now. Codepage 1047 is closer to the
legendary and mythical CP 37v2, but still off by two points. It switches
/not/ and /hat/ (circumflex, shift 6 on your US PC keyboard). Makes a
/mess/ of code and scripts which use either of those characters.
Thirty-two bit bummer.
Interestingly, this unofficial _CP 37v2 persists_. At least one ISV of
note (I won't say which, but Dave Rivers might chime in) continues using
an official pair of translate tables that /work consistently/ between
z/OS and Unix/Linux/Windows. And there was much rejoicing.
I can offer these ...
http://www.casita.net/pub/aecs.h
http://www.casita.net/pub/aecs.c
No warranties expressed or implied. In fact, I recommend /not/ using the
C routines for anything more than reference. (Code-up something in
assembler and let the hardware do the grunt work.)
Tagging text with one codepage or another is madness.
But assuming EBCDIC is always one thing and ASCII always an invariant
other is paint cornering you.
Eventually we will get to Unicode, and the chief cause of problems is
solutions.
Sixty-four bit bummer.
-- R; <><
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN