I am responsible for a product that uses Unicode Services to do a *lot* of translation, typically EBCDIC to UTF-8, which is non-trivial. I measured the overhead. I was pleasantly surprised.
On a z13s, with an optimized C++ compile, the total CPU time to translate 4016 records with an average length of 366 bytes each was 14.570 CPU milliseconds, including the overhead of the time measurement calls, about .115 CPU microseconds per record. So that's about 3.5 CPU microseconds per 366-byte translate. I was only interested in "my case" so I did not break it out into 'x' microseconds per call + 'y' microseconds per byte. Charles -----Original Message----- From: IBM Mainframe Assembler List [mailto:[email protected]] On Behalf Of Tony Harminc Sent: Thursday, October 19, 2017 9:14 AM To: [email protected] Subject: Re: ASCII self-defining constants On 19 October 2017 at 04:05, Jonathan Scott <[email protected]> wrote: > At present, CA constants always generate one ASCII byte per EBCDIC > source byte and CU constants always generate two bytes > (UCS-2) per source byte. Supporting any form of translation which > might result in different numbers of output bytes for different input > bytes (such as translation to UTF-8, which I think would be very > useful) would require new syntax and new semantics. Fair enough. It was just an example to use the CA constant type. > I note that although the combination of TRANSLATE and TRANSDT > translates character self-defining terms in Assembler expressions, it > does not currently apply to similar terms in conditional assembly SETA > expressions. Although this is a bit inconsistent, I think that this > supports my feeling that we currently only need to support CA and CE > (and perhaps CU) self-defining terms in Assembler expressions, not > SETA expressions. I am tempted to play with this using external funcrtions... > Although I have a lot of experience with z/OS Unicode Conversion > Services (in CICS internals) I am not keen on using them in HLASM, > because they are platform-dependent Are they? I really don't know, but I would expect that they exist in a reasonably compatible form on the other two EBCDIC platforms, zVM and zVSE. zLinux is another story, of course. But all UNIXy systems have iconv(), which should be able to do much the same things, though perhaps not optimized as much > The call overheads for trivial cases such as SBCS translation also > have a significant performance impact; for CICS we get round that by > calling the z/OS service with all 256 possible input byte values to > set up lookaside tables on the first use of each CCSID combination. > The lookaside tables consist of a TRT table to stop on any input byte > which doesn't translate to a single output byte (so requires the use > of the z/OS service) and a translate table to translate all supported > single-byte codes. If the TRT table is all zero, a flag is set to > indicate that a simple TR can be used, and if it is all non-zero, a > flag is set to indicate that all input must be passed to the z/OS > service. This means for example that when CICS is translating between > UTF-8 and EBCDIC, it can use a TRT to check for the 7-bit ASCII subset > of UTF-8 and if so just use a TR instead of calling the z/OS service. Interesting... A couple of years ago I looked into what Unicode services was actually doing when we called it in one of our products where we translate between EBCDIC and UTF-8. I was pleasantly surprised to find that once you've made the initial setup call, the overhead of actually translating is almost zero. There is no SVC or PC; it's entirely user state and key code that obtains an ALET that was set up by the Init call for the required translation tables (which will be built from others if needed), and uses the appropriate instruction(s) - in our case TROT/TRTO and the various Convert Unicode variations - to do the actual work.
