Edward Jaffe wrote:
> The following fragment should work if you prefer looping
> TRE over traditional TR. TRE requires you to manually
> translate the so-called "stop" character with an MVC.
> But, at least there's no EXecute for the final segment.
>
> LM R14,R15,xxxxxx Load string ptr and its length
> LA R1,xxxxxx Ptr to translation table
> XR R0,R0 Set stop char = x'00'
> DO INF Do for translate
> TRE R14,R1 Translate the string
> DOEXIT Z Exit if no more data
> IF O If iterate needed
> ITERATE , Process another segment
> ENDIF , EndIf
> MVC 0(1,R14),0(R1) Translate x'00' to whatever
> LA R14,1(,R14) Advance past stop character
> AHI R15,-1 Decrement length remaining
> DOEXIT NP Exit if no more data
> ENDDO , EndDo for translate
Art Celestini wrote:
> It seems that the TRE instruction has been in z/Arch for at
> least a few years. If anyone is inclined to try this:
>
> XR R1,R1 Clear for insert
> L R15,Length Load string length
> Loop IC R1,Input-1(R15) Get input byte
> IC R0,XlatTab(R1) Get translated character ...
> STC R0,Output-1(R15) ... and store it in output
> BCT R15,Loop Decrement length & loop until done
>
> it would be interesting to see how it fares against
> Ed Jaffe's code.
I did this, since I had a program I could just plug these code
segments into without doing a lot of work. Results are below.
> I believe the OP said that the data to be translated had to
> first be moved from one buffer to another. The above does
> that, but a move of some type needs to be added to Ed's code
> to make it a true comparison.
Maybe, maybe not. I've got code that needs to translate stuff
in a buffer and it does not need it moved. And I have other
code that first moves it and then translates it, because it
doesn't want to clobber what it's translating. But, I did it
both ways, just to find out for sure if it made a difference.
It does not. The TRE loop is so much faster for any substantial
number of bytes (which I define as more than 256, since that
number or less can be handled directly, inline, simply by using
the TR instruction) that the overhead of even a MVCL does not
even begin to eat into the gain by using a TRE loop. So, the
fact that with a TRE loop subroutine or macro you might whip
up you first have to move the data to be translated if you do
not want the original data clobbered is simply not relevant
from a performance perspective. Since there is no use for the
non-TRE loop subroutine (because its performance is horrible
for any substantial number of bytes), we are left with the TRE
or TR subroutines, which translate the data directly in the
buffer provided, which is what most programmers would want to
have available to call most of the time anyway, IMHO. If not,
then they would first have had to move the data to some other
buffer before TR'ing it anyway.
As you will see below, the TRE loop was faster for me when I
gave it more than 7 to 19 bytes. I'd never give it that few
since for anything <= 256 I'd just code a TR inline. But if
I didn't know how many bytes, then you can see that there is
plenty of CPU time left to test for 256 or less and do a TR
inline if so, or else call the TR[E] subroutine if I had more
than 256. Regardless, an ordinary TR loop is still faster
than using TRE. But this is what you would expect. The TR
loop code is not any more complicated than the TRE loop code
in the first place. It's just different. TRE does not replace
TR. It's for another purpose, basically, not for performance.
I revised the code above to suit my own personal taste and
needs. I made an improvement in the TRE subroutine proposed
by Edward Jaffe to allow the caller to specify the "test"
character, so that performance will not suffer if the data
to be translated contains a lot of null bytes (as Ed's would).
That meant that the MVC had to become an IC + STC.
Here is the code for the subroutines I called repeatedly to
gather the timing figures:
**------------------------------------------------------------------
**
** NOTE: ENTER VIA BAS R8,NOTR WITH REGS SET AS FOLLOWS:
** R14 = INPUT BUFFER ADDRESS
** R15 = OUTPUT BUFFER ADDRESS
** R0 = LENGTH OF BOTH INPUT AND OUTPUT BUFFER (MAY BE ZERO)
** R1 = 256-BYTE TRANSLATE TABLE ADDRESS
**
**------------------------------------------------------------------
NOTR LTR R2,R0 COPY LENGTH AND TEST FOR ZERO
BZR R8 RETURN IMMEDIATELY IF LENGTH=0
BCTR R15,0 1 BYTE IN FRONT OF OUTPUT BUFFER
BCTR R14,0 1 BYTE IN FRONT OF INPUT BUFFER
XR R3,R3 CLEAR FOR IC (USED AS INDEX REG)
LOOP IC R3,0(R2,R14) GET 1 BYTE STARTING FROM THE END
IC R0,0(R3,R1) TRANSLATE THAT BYTE USING TABLE
STC R0,0(R2,R15) PUT TRANLATED BYTE IN OUTPUT BFR
BCT R2,LOOP ADJUST LENGTH LOOP IF MORE TO DO
BR R8 RETURN TO CALLER
**------------------------------------------------------------------
**
** NOTE: ENTER VIA BAS R8,TRE WITH REGS SET AS FOLLOWS:
** R14 = BUFFER ADDRESS
** R15 = LENGTH OF BUFFER (MAY BE ZERO)
** [LOB] R0 = TEST CHARACTER. CAN BE ANY CHARACTER. BUT FOR
** PERFORMANCE REASONS, IT SHOULD BE ONE THAT IS
** THE LEAST LIKELY TO OR SIMPLY DOES NOT APPEAR
** IN THE BUFFER. NOTE THAT, IN MOST INSTANCES,
** THE X'00' (NULL) AND X'40' (BLANK) CHARACTERS
** ARE NOT LIKELY TO BE THE BEST CHOICE FOR THIS.
** R1 = 256-BYTE TRANSLATE TABLE ADDRESS
**
**------------------------------------------------------------------
TRE LHI R2,X'FF' SET R2 = X'000000FF'
NR R2,R0 ISOLATE STOP CHARACTER
TREL TRE R14,R1 TRANSLATE THE STRING
BZR R8 RETURN IF NO MORE DATA
BO TREL REISSUE IF MORE TO DO
IC R3,0(R2,R1) COPY BYTE IN TRANSLATE TABLE
STC R3,0(,R14) AT OFFSET OF TEST CHARACTER
LA R14,1(,R14) ADVANCE PAST TEST CHARACTER
AHI R15,-1 DECREMENT LENGTH REMAINING
BP TREL GO PAST X'00' IF MORE DATA
BR R8 RETURN TO CALLER
**-------------------------------------------------------------------
**
** NOTE: ENTER VIA BAS R8,TR WITH REGS SET AS FOLLOWS:
** R14 = BUFFER ADDRESS
** R15 = LENGTH OF BUFFER (MAY BE ZERO)
** R1 = 256-BYTE TRANSLATE TABLE ADDRESS
**
**-------------------------------------------------------------------
TR AHI R15,-1 -1 FOR 0-ORIGIN
BMR R8 RETURN IF LENGTH WAS NOT > 0
TRL CHI R15,256 256 CHARS (OR LESS) REMAIN?
BL TRLX YES, GO TRANSLATE LAST PIECE
TR 0(256,R14),0(R1) NO, TRANSLATE FIRST/NEXT 256
AHI R15,-256 CALCULATE LENGTH REMAINING
AHI R14,256 INCREMENT BUFFER POINTER
B TRL LOOP BACK TO DO NEXT 256 BYTES
TRLX EX R15,TREX TRANSLATE LAST PIECE OF BUFFER
BR R8
TREX TRT 0(*-*,R14),0(R1) TRANSLATE LESS THAN 256 BYTES
The following are the results I obtained on a 2094-714 / z9-109:
NO TR(E) USED14.434348 SEC TO XLATE X'0800' BYTES 1,000,000 TIMES
TRE LOOP USED 1.213316 SEC TO XLATE X'0800' BYTES 1,000,000 TIMES
TR LOOP USED 1.030490 SEC TO XLATE X'0800' BYTES 1,000,000 TIMES
NO TR(E) USED 7.142552 SEC TO XLATE X'0400' BYTES 1,000,000 TIMES
TRE LOOP USED 0.683257 SEC TO XLATE X'0400' BYTES 1,000,000 TIMES
TR LOOP USED 0.477014 SEC TO XLATE X'0400' BYTES 1,000,000 TIMES
NO TR(E) USED 3.578476 SEC TO XLATE X'0200' BYTES 1,000,000 TIMES
TRE LOOP USED 0.445753 SEC TO XLATE X'0200' BYTES 1,000,000 TIMES
TR LOOP USED 0.200682 SEC TO XLATE X'0200' BYTES 1,000,000 TIMES
NO TR(E) USED 1.797026 SEC TO XLATE X'0100' BYTES 1,000,000 TIMES
TRE LOOP USED 0.239344 SEC TO XLATE X'0100' BYTES 1,000,000 TIMES
TR LOOP USED 0.031291 SEC TO XLATE X'0100' BYTES 1,000,000 TIMES
NO TR(E) USED 1.351607 SEC TO XLATE X'00C0' BYTES 1,000,000 TIMES
TRE LOOP USED 0.210312 SEC TO XLATE X'00C0' BYTES 1,000,000 TIMES
TR LOOP USED 0.031306 SEC TO XLATE X'00C0' BYTES 1,000,000 TIMES
NO TR(E) USED 0.906534 SEC TO XLATE X'0080' BYTES 1,000,000 TIMES
TRE LOOP USED 0.198645 SEC TO XLATE X'0080' BYTES 1,000,000 TIMES
TR LOOP USED 0.031321 SEC TO XLATE X'0080' BYTES 1,000,000 TIMES
NO TR(E) USED 0.462027 SEC TO XLATE X'0040' BYTES 1,000,000 TIMES
TRE LOOP USED 0.114426 SEC TO XLATE X'0040' BYTES 1,000,000 TIMES
TR LOOP USED 0.031289 SEC TO XLATE X'0040' BYTES 1,000,000 TIMES
NO TR(E) USED 0.238364 SEC TO XLATE X'0020' BYTES 1,000,000 TIMES
TRE LOOP USED 0.102487 SEC TO XLATE X'0020' BYTES 1,000,000 TIMES
TR LOOP USED 0.031304 SEC TO XLATE X'0020' BYTES 1,000,000 TIMES
NO TR(E) USED 0.224285 SEC TO XLATE X'001E' BYTES 1,000,000 TIMES
TRE LOOP USED 0.106776 SEC TO XLATE X'001E' BYTES 1,000,000 TIMES
TR LOOP USED 0.031290 SEC TO XLATE X'001E' BYTES 1,000,000 TIMES
NO TR(E) USED 0.210406 SEC TO XLATE X'001C' BYTES 1,000,000 TIMES
TRE LOOP USED 0.108267 SEC TO XLATE X'001C' BYTES 1,000,000 TIMES
TR LOOP USED 0.031289 SEC TO XLATE X'001C' BYTES 1,000,000 TIMES
NO TR(E) USED 0.196512 SEC TO XLATE X'001A' BYTES 1,000,000 TIMES
TRE LOOP USED 0.107565 SEC TO XLATE X'001A' BYTES 1,000,000 TIMES
TR LOOP USED 0.031326 SEC TO XLATE X'001A' BYTES 1,000,000 TIMES
NO TR(E) USED 0.182702 SEC TO XLATE X'0018' BYTES 1,000,000 TIMES
TRE LOOP USED 0.101648 SEC TO XLATE X'0018' BYTES 1,000,000 TIMES
TR LOOP USED 0.031298 SEC TO XLATE X'0018' BYTES 1,000,000 TIMES
NO TR(E) USED 0.168835 SEC TO XLATE X'0016' BYTES 1,000,000 TIMES
TRE LOOP USED 0.096894 SEC TO XLATE X'0016' BYTES 1,000,000 TIMES
TR LOOP USED 0.031372 SEC TO XLATE X'0016' BYTES 1,000,000 TIMES
NO TR(E) USED 0.154890 SEC TO XLATE X'0014' BYTES 1,000,000 TIMES
TRE LOOP USED 0.097152 SEC TO XLATE X'0014' BYTES 1,000,000 TIMES
TR LOOP USED 0.031300 SEC TO XLATE X'0014' BYTES 1,000,000 TIMES
NO TR(E) USED 0.140851 SEC TO XLATE X'0012' BYTES 1,000,000 TIMES
TRE LOOP USED 0.098532 SEC TO XLATE X'0012' BYTES 1,000,000 TIMES
TR LOOP USED 0.031297 SEC TO XLATE X'0012' BYTES 1,000,000 TIMES
NO TR(E) USED 0.126914 SEC TO XLATE X'0010' BYTES 1,000,000 TIMES
TRE LOOP USED 0.069528 SEC TO XLATE X'0010' BYTES 1,000,000 TIMES
TR LOOP USED 0.031306 SEC TO XLATE X'0010' BYTES 1,000,000 TIMES
NO TR(E) USED 0.113049 SEC TO XLATE X'000E' BYTES 1,000,000 TIMES
TRE LOOP USED 0.072118 SEC TO XLATE X'000E' BYTES 1,000,000 TIMES
TR LOOP USED 0.031289 SEC TO XLATE X'000E' BYTES 1,000,000 TIMES
NO TR(E) USED 0.099111 SEC TO XLATE X'000C' BYTES 1,000,000 TIMES
TRE LOOP USED 0.071591 SEC TO XLATE X'000C' BYTES 1,000,000 TIMES
TR LOOP USED 0.031296 SEC TO XLATE X'000C' BYTES 1,000,000 TIMES
NO TR(E) USED 0.085172 SEC TO XLATE X'000A' BYTES 1,000,000 TIMES
TRE LOOP USED 0.071743 SEC TO XLATE X'000A' BYTES 1,000,000 TIMES
TR LOOP USED 0.031312 SEC TO XLATE X'000A' BYTES 1,000,000 TIMES
NO TR(E) USED 0.071273 SEC TO XLATE X'0008' BYTES 1,000,000 TIMES
TRE LOOP USED 0.068731 SEC TO XLATE X'0008' BYTES 1,000,000 TIMES
TR LOOP USED 0.029558 SEC TO XLATE X'0008' BYTES 1,000,000 TIMES
NO TR(E) USED 0.064337 SEC TO XLATE X'0007' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057373 SEC TO XLATE X'0007' BYTES 1,000,000 TIMES
TR LOOP USED 0.029557 SEC TO XLATE X'0007' BYTES 1,000,000 TIMES
NO TR(E) USED 0.057381 SEC TO XLATE X'0006' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057379 SEC TO XLATE X'0006' BYTES 1,000,000 TIMES
TR LOOP USED 0.029560 SEC TO XLATE X'0006' BYTES 1,000,000 TIMES
NO TR(E) USED 0.050431 SEC TO XLATE X'0005' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057365 SEC TO XLATE X'0005' BYTES 1,000,000 TIMES
TR LOOP USED 0.029547 SEC TO XLATE X'0005' BYTES 1,000,000 TIMES
NO TR(E) USED 0.043480 SEC TO XLATE X'0004' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057370 SEC TO XLATE X'0004' BYTES 1,000,000 TIMES
TR LOOP USED 0.029558 SEC TO XLATE X'0004' BYTES 1,000,000 TIMES
NO TR(E) USED 0.036505 SEC TO XLATE X'0003' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057389 SEC TO XLATE X'0003' BYTES 1,000,000 TIMES
TR LOOP USED 0.029558 SEC TO XLATE X'0003' BYTES 1,000,000 TIMES
NO TR(E) USED 0.029555 SEC TO XLATE X'0002' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057363 SEC TO XLATE X'0002' BYTES 1,000,000 TIMES
TR LOOP USED 0.029559 SEC TO XLATE X'0002' BYTES 1,000,000 TIMES
NO TR(E) USED 0.022616 SEC TO XLATE X'0001' BYTES 1,000,000 TIMES
TRE LOOP USED 0.057450 SEC TO XLATE X'0001' BYTES 1,000,000 TIMES
TR LOOP USED 0.029030 SEC TO XLATE X'0001' BYTES 1,000,000 TIMES
NO TR(E) USED 0.008419 SEC TO XLATE X'0000' BYTES 1,000,000 TIMES
TRE LOOP USED 0.016833 SEC TO XLATE X'0000' BYTES 1,000,000 TIMES
TR LOOP USED 0.005217 SEC TO XLATE X'0000' BYTES 1,000,000 TIMES
As you can see, for as little as 6 characters (on different runs
I made it was up to as little as 19 characters, so the breakpoint
is anywhere between 5 and 24, I suspect), the non-TR[E] (that is,
the byte-by-byte) version runs faster than the TRE code. But for
more than 256 characters the TRE loop certainly run a lot faster
than the byte-by-byte version! As most bright folks here should
naturally expect, however, nothing beats an ordinary TR loop, in
the same manner that an ordinary MVC loop usually beats an MVCL,
except for a very large number of bytes.
If anybody wants the entire program so that they can run it on
their machine, or change the data to be translated that I used,
please let me know. It does not depend on any macros other than
what is in SYS1.MACLIB.
--
WB
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html