On 19 October 2017 at 04:05, Jonathan Scott <[email protected]> wrote:

> At present, CA constants always generate one ASCII byte per
> EBCDIC source byte and CU constants always generate two bytes
> (UCS-2) per source byte.  Supporting any form of translation
> which might result in different numbers of output bytes for
> different input bytes (such as translation to UTF-8, which I
> think would be very useful) would require new syntax and new
> semantics.

Fair enough. It was just an example to use the CA constant type.

> I note that although the combination of TRANSLATE and TRANSDT
> translates character self-defining terms in Assembler
> expressions, it does not currently apply to similar terms in
> conditional assembly SETA expressions. Although this is a bit
> inconsistent, I think that this supports my feeling that we
> currently only need to support CA and CE (and perhaps CU)
> self-defining terms in Assembler expressions, not SETA
> expressions.

I am tempted to play with this using external funcrtions...

> Although I have a lot of experience with z/OS Unicode Conversion
> Services (in CICS internals) I am not keen on using them in
> HLASM, because they are platform-dependent

Are they? I really don't know, but I would expect that they exist in a
reasonably compatible form on the other two EBCDIC platforms, zVM and
zVSE. zLinux is another story, of course. But all UNIXy systems have
iconv(), which should be able to do much the same things, though
perhaps not optimized as much

> The call overheads for trivial cases such as SBCS translation
> also have a significant performance impact; for CICS we get round
> that by calling the z/OS service with all 256 possible input byte
> values to set up lookaside tables on the first use of each CCSID
> combination. The lookaside tables consist of a TRT table to stop
> on any input byte which doesn't translate to a single output byte
> (so requires the use of the z/OS service) and a translate table
> to translate all supported single-byte codes. If the TRT table
> is all zero, a flag is set to indicate that a simple TR can be
> used, and if it is all non-zero, a flag is set to indicate that
> all input must be passed to the z/OS service. This means for
> example that when CICS is translating between UTF-8 and EBCDIC,
> it can use a TRT to check for the 7-bit ASCII subset of UTF-8
> and if so just use a TR instead of calling the z/OS service.

Interesting... A couple of years ago I looked into what Unicode
services was actually doing when we called it in one of our products
where we translate between EBCDIC and UTF-8. I was pleasantly
surprised to find that once you've made the initial setup call, the
overhead of actually translating is almost zero. There is no SVC or
PC; it's entirely user state and key code that obtains an ALET that
was set up by the Init call for the required translation tables (which
will be built from others if needed), and uses the appropriate
instruction(s) - in our case TROT/TRTO and the various Convert Unicode
variations - to do the actual work.

> SUPERC supports conversion from ASCII 819 to all 21 CECP and
> ECECP EBCDIC code pages itself without requiring a lot of table
> storage by having a base translate table (for code page 819 to
> 1047) and a list of the differences for each additional supported
> EBCDIC code page, generated by a CMS Pipelines program.  The
> tables for ASCII to EBCDIC conversion are built at start-up by
> copying the base table and applying the differences for the
> specified code page (which for running under ISPF defaults to the
> terminal code page if it is in the supported list).  The Euro is
> not fully supported in this case, as it is not present in 819, so
> the equivalent non-Euro table is used in which the ASCII
> "currency sign" maps to the EBCDIC Euro.

While all the above may once have been a nifty idea, I think it's
almost certainly not one now. Unicode services are very efficient, and
will take care of the sort of two-stage translation you mention, if
necessary. For a simple single-byte to single-byte translation like
819<->1047, the call overhead is probably about 10 instructions plus
one TROO instruction. It's very little more for multibyte translation.

> The HLASM CODEPAGE option is currently only supported at
> start-up mainly because it is implemented by loading the
> appropriate load module for the selected table.  If the
> translate table could be built dynamically from internal
> information, this would allow it to be controlled by a *PROCESS
> statement.

Did I remember to mention Unicode conversion services... :-)

> As usual, please note that the fact that we are looking at this
> doesn't guarantee that anything will actually be done about it in
> the near future!

Yup - we hear you.

Tony H.

Reply via email to