Re: ASCII self-defining constants

Jonathan Scott Thu, 19 Oct 2017 02:10:16 -0700

Ref:  Your note of Wed, 18 Oct 2017 21:02:15 -0400

At present, CA constants always generate one ASCII byte per
EBCDIC source byte and CU constants always generate two bytes
(UCS-2) per source byte.  Supporting any form of translation
which might result in different numbers of output bytes for
different input bytes (such as translation to UTF-8, which I
think would be very useful) would require new syntax and new
semantics.


The CODEPAGE option currently indicates the input code page
(although this information is only used for Unicode conversion)
and the TRANSLATE option determines the output code page, which
only supports SBCS translate tables.

I note that although the combination of TRANSLATE and TRANSDT
translates character self-defining terms in Assembler
expressions, it does not currently apply to similar terms in
conditional assembly SETA expressions. Although this is a bit
inconsistent, I think that this supports my feeling that we
currently only need to support CA and CE (and perhaps CU)
self-defining terms in Assembler expressions, not SETA
expressions.

Although I have a lot of experience with z/OS Unicode Conversion
Services (in CICS internals) I am not keen on using them in
HLASM, because they are platform-dependent and because they are
mostly useful for mixed-length characters such as DBCS SO/SI and
UTF-8, for which conversion is not currently supported by HLASM.
The call overheads for trivial cases such as SBCS translation
also have a significant performance impact; for CICS we get round
that by calling the z/OS service with all 256 possible input byte
values to set up lookaside tables on the first use of each CCSID
combination. The lookaside tables consist of a TRT table to stop
on any input byte which doesn't translate to a single output byte
(so requires the use of the z/OS service) and a translate table
to translate all supported single-byte codes. If the TRT table
is all zero, a flag is set to indicate that a simple TR can be
used, and if it is all non-zero, a flag is set to indicate that
all input must be passed to the z/OS service. This means for
example that when CICS is translating between UTF-8 and EBCDIC,
it can use a TRT to check for the 7-bit ASCII subset of UTF-8
and if so just use a TR instead of calling the z/OS service.

SUPERC supports conversion from ASCII 819 to all 21 CECP and
ECECP EBCDIC code pages itself without requiring a lot of table
storage by having a base translate table (for code page 819 to
1047) and a list of the differences for each additional supported
EBCDIC code page, generated by a CMS Pipelines program.  The
tables for ASCII to EBCDIC conversion are built at start-up by
copying the base table and applying the differences for the
specified code page (which for running under ISPF defaults to the
terminal code page if it is in the supported list).  The Euro is
not fully supported in this case, as it is not present in 819, so
the equivalent non-Euro table is used in which the ASCII
"currency sign" maps to the EBCDIC Euro.

The HLASM CODEPAGE option is currently only supported at
start-up mainly because it is implemented by loading the
appropriate load module for the selected table.  If the
translate table could be built dynamically from internal
information, this would allow it to be controlled by a *PROCESS
statement.

As usual, please note that the fact that we are looking at this
doesn't guarantee that anything will actually be done about it in
the near future!

Jonathan Scott
HLASM team, IBM Hursley, UK

Re: ASCII self-defining constants

Reply via email to