Ref: Your note of Wed, 18 Oct 2017 11:06:58 -0600 As someone with deep experience in code page conversion I must admit I find HLASM inconsistent and dreadfully US-centric, but it's hard to see how to fix it without breaking compatibility. (I did at least manage to add some ASCII support to SUPERC when I was previously in the HLASM team a few years ago, supporting conversion between all single-byte CECP and ECECP EBCDIC code pages and ASCII 819).
The static tables used to translate Linux ASCII source to EBCDIC and used for TRANSLATE(AS) to translate character strings to ASCII currently both map between EBCDIC US CECP code page 37 and ASCII ISO 8859-1 code page 819, using a full reversible mapping. The static ASCII table used for the CA data type currently maps a subset of EBCDIC 37 to the printable 7-bit US ASCII subset of ASCII 819 (that is, x'20' through x'7E') and leaves all other EBCDIC codes unchanged. Even though this does not seem very helpful, leaving non-ASCII codes unchanged is a documented feature, so it's not entirely clear what the impact might be of changing it to a full 256-byte table, as I would like to do. The CODEPAGE option currently only controls the table used to translate CU character constants from EBCDIC to Unicode (UCS-2). At present, tables are only provided for the ECECP (Euro) EBCDIC code pages. Apart from the Euro character, each table is equivalent to translating to ASCII 819 then adding a leading zero byte. However, for some unknown reason the default EBCDIC table is 1148 (ECECP International 1), equivalent to the non-Euro CECP code page 500, which is based on ASCII compatibility and is significantly different from code page 37 (in that for example cent, vertical bar, exclamation mark and "not" sign are replaced with left square bracket, exclamation mark, right square bracket and caret or "roof"). I think that my preferred solution would be to specify the EBCDIC code page for all purposes using the CODEPAGE option (or perhaps a new CCSID option) and to assume that ASCII generally means ISO 8859-1 code page 819, with a full 256-byte translate table. (A CCSID option might even make it possible to convert mixed DBCS constants to UTF-8 in the distant future). However, the main complication is how to get to that position without requiring lots of new options, making it as easy as possible to get it right, while avoiding breaking compatibility. Regards Jonathan Scott HLASM team, IBM Hursley, UK
