Ref: Your note of Wed, 18 Oct 2017 21:02:15 -0400 At present, CA constants always generate one ASCII byte per EBCDIC source byte and CU constants always generate two bytes (UCS-2) per source byte. Supporting any form of translation which might result in different numbers of output bytes for different input bytes (such as translation to UTF-8, which I think would be very useful) would require new syntax and new semantics.
The CODEPAGE option currently indicates the input code page (although this information is only used for Unicode conversion) and the TRANSLATE option determines the output code page, which only supports SBCS translate tables. I note that although the combination of TRANSLATE and TRANSDT translates character self-defining terms in Assembler expressions, it does not currently apply to similar terms in conditional assembly SETA expressions. Although this is a bit inconsistent, I think that this supports my feeling that we currently only need to support CA and CE (and perhaps CU) self-defining terms in Assembler expressions, not SETA expressions. Although I have a lot of experience with z/OS Unicode Conversion Services (in CICS internals) I am not keen on using them in HLASM, because they are platform-dependent and because they are mostly useful for mixed-length characters such as DBCS SO/SI and UTF-8, for which conversion is not currently supported by HLASM. The call overheads for trivial cases such as SBCS translation also have a significant performance impact; for CICS we get round that by calling the z/OS service with all 256 possible input byte values to set up lookaside tables on the first use of each CCSID combination. The lookaside tables consist of a TRT table to stop on any input byte which doesn't translate to a single output byte (so requires the use of the z/OS service) and a translate table to translate all supported single-byte codes. If the TRT table is all zero, a flag is set to indicate that a simple TR can be used, and if it is all non-zero, a flag is set to indicate that all input must be passed to the z/OS service. This means for example that when CICS is translating between UTF-8 and EBCDIC, it can use a TRT to check for the 7-bit ASCII subset of UTF-8 and if so just use a TR instead of calling the z/OS service. SUPERC supports conversion from ASCII 819 to all 21 CECP and ECECP EBCDIC code pages itself without requiring a lot of table storage by having a base translate table (for code page 819 to 1047) and a list of the differences for each additional supported EBCDIC code page, generated by a CMS Pipelines program. The tables for ASCII to EBCDIC conversion are built at start-up by copying the base table and applying the differences for the specified code page (which for running under ISPF defaults to the terminal code page if it is in the supported list). The Euro is not fully supported in this case, as it is not present in 819, so the equivalent non-Euro table is used in which the ASCII "currency sign" maps to the EBCDIC Euro. The HLASM CODEPAGE option is currently only supported at start-up mainly because it is implemented by loading the appropriate load module for the selected table. If the translate table could be built dynamically from internal information, this would allow it to be controlled by a *PROCESS statement. As usual, please note that the fact that we are looking at this doesn't guarantee that anything will actually be done about it in the near future! Jonathan Scott HLASM team, IBM Hursley, UK
