Ref:  Your note of Wed, 18 Oct 2017 11:06:58 -0600

As someone with deep experience in code page conversion I must
admit I find HLASM inconsistent and dreadfully US-centric, but
it's hard to see how to fix it without breaking compatibility.
(I did at least manage to add some ASCII support to SUPERC when
I was previously in the HLASM team a few years ago, supporting
conversion between all single-byte CECP and ECECP EBCDIC code
pages and ASCII 819).

The static tables used to translate Linux ASCII source to EBCDIC
and used for TRANSLATE(AS) to translate character strings to
ASCII currently both map between EBCDIC US CECP code page 37 and
ASCII ISO 8859-1 code page 819, using a full reversible mapping.

The static ASCII table used for the CA data type currently maps
a subset of EBCDIC 37 to the printable 7-bit US ASCII subset of
ASCII 819 (that is, x'20' through x'7E') and leaves all other
EBCDIC codes unchanged.  Even though this does not seem very
helpful, leaving non-ASCII codes unchanged is a documented
feature, so it's not entirely clear what the impact might be
of changing it to a full 256-byte table, as I would like to do.

The CODEPAGE option currently only controls the table used to
translate CU character constants from EBCDIC to Unicode (UCS-2).
At present, tables are only provided for the ECECP (Euro) EBCDIC
code pages.  Apart from the Euro character, each table is
equivalent to translating to ASCII 819 then adding a leading zero
byte.  However, for some unknown reason the default EBCDIC table
is 1148 (ECECP International 1), equivalent to the non-Euro CECP
code page 500, which is based on ASCII compatibility and is
significantly different from code page 37 (in that for example
cent, vertical bar, exclamation mark and "not" sign are replaced
with left square bracket, exclamation mark, right square bracket
and caret or "roof").

I think that my preferred solution would be to specify the EBCDIC
code page for all purposes using the CODEPAGE option (or perhaps
a new CCSID option) and to assume that ASCII generally means ISO
8859-1 code page 819, with a full 256-byte translate table.  (A
CCSID option might even make it possible to convert mixed DBCS
constants to UTF-8 in the distant future).  However, the main
complication is how to get to that position without requiring
lots of new options, making it as easy as possible to get it
right, while avoiding breaking compatibility.

Regards
Jonathan Scott
HLASM team, IBM Hursley, UK

Reply via email to