William: Thank you for taking the time to give this a try. I had heard some horror stories about TR performance being "disappointing" on some earlier Z/Arch machines and I was wondering if it was pervasive. Obviously not.
Not to be a nit-picker, but the OP (Kirk Wolf) said, "I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table," which is part of what prompted me to suggest the open-code solution that I did, since it includes a "move" from one buffer to another as part of the process. I'm convinced that TRE and TR are faster but it seems that a truly "fair" comparison of solutions to the stated problem should have included equivalent moves in the TRE and TR solutions. -- Art C. At 05:33 AM 3/26/2008, William H. Blair wrote: >Edward Jaffe wrote: > >> The following fragment should work if you prefer looping >> TRE over traditional TR. TRE requires you to manually >> translate the so-called "stop" character with an MVC. >> But, at least there's no EXecute for the final segment. >> >> LM R14,R15,xxxxxx Load string ptr and its length >> LA R1,xxxxxx Ptr to translation table >> XR R0,R0 Set stop char = x'00' >> DO INF Do for translate >> TRE R14,R1 Translate the string >> DOEXIT Z Exit if no more data >> IF O If iterate needed >> ITERATE , Process another segment >> ENDIF , EndIf >> MVC 0(1,R14),0(R1) Translate x'00' to whatever >> LA R14,1(,R14) Advance past stop character >> AHI R15,-1 Decrement length remaining >> DOEXIT NP Exit if no more data >> ENDDO , EndDo for translate > >Art Celestini wrote: > >> It seems that the TRE instruction has been in z/Arch for at >> least a few years. If anyone is inclined to try this: >> >> XR R1,R1 Clear for insert >> L R15,Length Load string length >> Loop IC R1,Input-1(R15) Get input byte >> IC R0,XlatTab(R1) Get translated character ... >> STC R0,Output-1(R15) ... and store it in output >> BCT R15,Loop Decrement length & loop until done >> >> it would be interesting to see how it fares against >> Ed Jaffe's code. > >I did this, since I had a program I could just plug these code >segments into without doing a lot of work. Results are below. > >> I believe the OP said that the data to be translated had to >> first be moved from one buffer to another. The above does >> that, but a move of some type needs to be added to Ed's code >> to make it a true comparison. > >Maybe, maybe not. I've got code that needs to translate stuff >in a buffer and it does not need it moved. And I have other >code that first moves it and then translates it, because it >doesn't want to clobber what it's translating. But, I did it >both ways, just to find out for sure if it made a difference. >It does not. The TRE loop is so much faster for any substantial >number of bytes (which I define as more than 256, since that >number or less can be handled directly, inline, simply by using >the TR instruction) that the overhead of even a MVCL does not >even begin to eat into the gain by using a TRE loop. So, the >fact that with a TRE loop subroutine or macro you might whip >up you first have to move the data to be translated if you do >not want the original data clobbered is simply not relevant >from a performance perspective. Since there is no use for the >non-TRE loop subroutine (because its performance is horrible >for any substantial number of bytes), we are left with the TRE >or TR subroutines, which translate the data directly in the >buffer provided, which is what most programmers would want to >have available to call most of the time anyway, IMHO. If not, >then they would first have had to move the data to some other >buffer before TR'ing it anyway. > >As you will see below, the TRE loop was faster for me when I >gave it more than 7 to 19 bytes. I'd never give it that few >since for anything <= 256 I'd just code a TR inline. But if >I didn't know how many bytes, then you can see that there is >plenty of CPU time left to test for 256 or less and do a TR >inline if so, or else call the TR[E] subroutine if I had more >than 256. Regardless, an ordinary TR loop is still faster >than using TRE. But this is what you would expect. The TR >loop code is not any more complicated than the TRE loop code >in the first place. It's just different. TRE does not replace >TR. It's for another purpose, basically, not for performance. > >I revised the code above to suit my own personal taste and >needs. I made an improvement in the TRE subroutine proposed >by Edward Jaffe to allow the caller to specify the "test" >character, so that performance will not suffer if the data >to be translated contains a lot of null bytes (as Ed's would). >That meant that the MVC had to become an IC + STC. > >Here is the code for the subroutines I called repeatedly to >gather the timing figures: > >**------------------------------------------------------------------ >** >** NOTE: ENTER VIA BAS R8,NOTR WITH REGS SET AS FOLLOWS: >** R14 = INPUT BUFFER ADDRESS >** R15 = OUTPUT BUFFER ADDRESS >** R0 = LENGTH OF BOTH INPUT AND OUTPUT BUFFER (MAY BE ZERO) >** R1 = 256-BYTE TRANSLATE TABLE ADDRESS >** >**------------------------------------------------------------------ >NOTR LTR R2,R0 COPY LENGTH AND TEST FOR ZERO > BZR R8 RETURN IMMEDIATELY IF LENGTH=0 > BCTR R15,0 1 BYTE IN FRONT OF OUTPUT BUFFER > BCTR R14,0 1 BYTE IN FRONT OF INPUT BUFFER > XR R3,R3 CLEAR FOR IC (USED AS INDEX REG) >LOOP IC R3,0(R2,R14) GET 1 BYTE STARTING FROM THE END > IC R0,0(R3,R1) TRANSLATE THAT BYTE USING TABLE > STC R0,0(R2,R15) PUT TRANLATED BYTE IN OUTPUT BFR > BCT R2,LOOP ADJUST LENGTH LOOP IF MORE TO DO > BR R8 RETURN TO CALLER > >**------------------------------------------------------------------ >** >** NOTE: ENTER VIA BAS R8,TRE WITH REGS SET AS FOLLOWS: >** R14 = BUFFER ADDRESS >** R15 = LENGTH OF BUFFER (MAY BE ZERO) >** [LOB] R0 = TEST CHARACTER. CAN BE ANY CHARACTER. BUT FOR >** PERFORMANCE REASONS, IT SHOULD BE ONE THAT IS >** THE LEAST LIKELY TO OR SIMPLY DOES NOT APPEAR >** IN THE BUFFER. NOTE THAT, IN MOST INSTANCES, >** THE X'00' (NULL) AND X'40' (BLANK) CHARACTERS >** ARE NOT LIKELY TO BE THE BEST CHOICE FOR THIS. >** R1 = 256-BYTE TRANSLATE TABLE ADDRESS >** >**------------------------------------------------------------------ >TRE LHI R2,X'FF' SET R2 = X'000000FF' > NR R2,R0 ISOLATE STOP CHARACTER >TREL TRE R14,R1 TRANSLATE THE STRING > BZR R8 RETURN IF NO MORE DATA > BO TREL REISSUE IF MORE TO DO > IC R3,0(R2,R1) COPY BYTE IN TRANSLATE TABLE > STC R3,0(,R14) AT OFFSET OF TEST CHARACTER > LA R14,1(,R14) ADVANCE PAST TEST CHARACTER > AHI R15,-1 DECREMENT LENGTH REMAINING > BP TREL GO PAST X'00' IF MORE DATA > BR R8 RETURN TO CALLER > >**------------------------------------------------------------------- >** >** NOTE: ENTER VIA BAS R8,TR WITH REGS SET AS FOLLOWS: >** R14 = BUFFER ADDRESS >** R15 = LENGTH OF BUFFER (MAY BE ZERO) >** R1 = 256-BYTE TRANSLATE TABLE ADDRESS >** >**------------------------------------------------------------------- >TR AHI R15,-1 -1 FOR 0-ORIGIN > BMR R8 RETURN IF LENGTH WAS NOT > 0 >TRL CHI R15,256 256 CHARS (OR LESS) REMAIN? > BL TRLX YES, GO TRANSLATE LAST PIECE > TR 0(256,R14),0(R1) NO, TRANSLATE FIRST/NEXT 256 > AHI R15,-256 CALCULATE LENGTH REMAINING > AHI R14,256 INCREMENT BUFFER POINTER > B TRL LOOP BACK TO DO NEXT 256 BYTES >TRLX EX R15,TREX TRANSLATE LAST PIECE OF BUFFER > BR R8 >TREX TRT 0(*-*,R14),0(R1) TRANSLATE LESS THAN 256 BYTES > > >The following are the results I obtained on a 2094-714 / z9-109: > >NO TR(E) USED14.434348 SEC TO XLATE X'0800' BYTES 1,000,000 TIMES >TRE LOOP USED 1.213316 SEC TO XLATE X'0800' BYTES 1,000,000 TIMES >TR LOOP USED 1.030490 SEC TO XLATE X'0800' BYTES 1,000,000 TIMES > >NO TR(E) USED 7.142552 SEC TO XLATE X'0400' BYTES 1,000,000 TIMES >TRE LOOP USED 0.683257 SEC TO XLATE X'0400' BYTES 1,000,000 TIMES >TR LOOP USED 0.477014 SEC TO XLATE X'0400' BYTES 1,000,000 TIMES > >NO TR(E) USED 3.578476 SEC TO XLATE X'0200' BYTES 1,000,000 TIMES >TRE LOOP USED 0.445753 SEC TO XLATE X'0200' BYTES 1,000,000 TIMES >TR LOOP USED 0.200682 SEC TO XLATE X'0200' BYTES 1,000,000 TIMES > >NO TR(E) USED 1.797026 SEC TO XLATE X'0100' BYTES 1,000,000 TIMES >TRE LOOP USED 0.239344 SEC TO XLATE X'0100' BYTES 1,000,000 TIMES >TR LOOP USED 0.031291 SEC TO XLATE X'0100' BYTES 1,000,000 TIMES > >NO TR(E) USED 1.351607 SEC TO XLATE X'00C0' BYTES 1,000,000 TIMES >TRE LOOP USED 0.210312 SEC TO XLATE X'00C0' BYTES 1,000,000 TIMES >TR LOOP USED 0.031306 SEC TO XLATE X'00C0' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.906534 SEC TO XLATE X'0080' BYTES 1,000,000 TIMES >TRE LOOP USED 0.198645 SEC TO XLATE X'0080' BYTES 1,000,000 TIMES >TR LOOP USED 0.031321 SEC TO XLATE X'0080' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.462027 SEC TO XLATE X'0040' BYTES 1,000,000 TIMES >TRE LOOP USED 0.114426 SEC TO XLATE X'0040' BYTES 1,000,000 TIMES >TR LOOP USED 0.031289 SEC TO XLATE X'0040' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.238364 SEC TO XLATE X'0020' BYTES 1,000,000 TIMES >TRE LOOP USED 0.102487 SEC TO XLATE X'0020' BYTES 1,000,000 TIMES >TR LOOP USED 0.031304 SEC TO XLATE X'0020' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.224285 SEC TO XLATE X'001E' BYTES 1,000,000 TIMES >TRE LOOP USED 0.106776 SEC TO XLATE X'001E' BYTES 1,000,000 TIMES >TR LOOP USED 0.031290 SEC TO XLATE X'001E' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.210406 SEC TO XLATE X'001C' BYTES 1,000,000 TIMES >TRE LOOP USED 0.108267 SEC TO XLATE X'001C' BYTES 1,000,000 TIMES >TR LOOP USED 0.031289 SEC TO XLATE X'001C' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.196512 SEC TO XLATE X'001A' BYTES 1,000,000 TIMES >TRE LOOP USED 0.107565 SEC TO XLATE X'001A' BYTES 1,000,000 TIMES >TR LOOP USED 0.031326 SEC TO XLATE X'001A' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.182702 SEC TO XLATE X'0018' BYTES 1,000,000 TIMES >TRE LOOP USED 0.101648 SEC TO XLATE X'0018' BYTES 1,000,000 TIMES >TR LOOP USED 0.031298 SEC TO XLATE X'0018' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.168835 SEC TO XLATE X'0016' BYTES 1,000,000 TIMES >TRE LOOP USED 0.096894 SEC TO XLATE X'0016' BYTES 1,000,000 TIMES >TR LOOP USED 0.031372 SEC TO XLATE X'0016' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.154890 SEC TO XLATE X'0014' BYTES 1,000,000 TIMES >TRE LOOP USED 0.097152 SEC TO XLATE X'0014' BYTES 1,000,000 TIMES >TR LOOP USED 0.031300 SEC TO XLATE X'0014' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.140851 SEC TO XLATE X'0012' BYTES 1,000,000 TIMES >TRE LOOP USED 0.098532 SEC TO XLATE X'0012' BYTES 1,000,000 TIMES >TR LOOP USED 0.031297 SEC TO XLATE X'0012' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.126914 SEC TO XLATE X'0010' BYTES 1,000,000 TIMES >TRE LOOP USED 0.069528 SEC TO XLATE X'0010' BYTES 1,000,000 TIMES >TR LOOP USED 0.031306 SEC TO XLATE X'0010' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.113049 SEC TO XLATE X'000E' BYTES 1,000,000 TIMES >TRE LOOP USED 0.072118 SEC TO XLATE X'000E' BYTES 1,000,000 TIMES >TR LOOP USED 0.031289 SEC TO XLATE X'000E' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.099111 SEC TO XLATE X'000C' BYTES 1,000,000 TIMES >TRE LOOP USED 0.071591 SEC TO XLATE X'000C' BYTES 1,000,000 TIMES >TR LOOP USED 0.031296 SEC TO XLATE X'000C' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.085172 SEC TO XLATE X'000A' BYTES 1,000,000 TIMES >TRE LOOP USED 0.071743 SEC TO XLATE X'000A' BYTES 1,000,000 TIMES >TR LOOP USED 0.031312 SEC TO XLATE X'000A' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.071273 SEC TO XLATE X'0008' BYTES 1,000,000 TIMES >TRE LOOP USED 0.068731 SEC TO XLATE X'0008' BYTES 1,000,000 TIMES >TR LOOP USED 0.029558 SEC TO XLATE X'0008' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.064337 SEC TO XLATE X'0007' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057373 SEC TO XLATE X'0007' BYTES 1,000,000 TIMES >TR LOOP USED 0.029557 SEC TO XLATE X'0007' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.057381 SEC TO XLATE X'0006' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057379 SEC TO XLATE X'0006' BYTES 1,000,000 TIMES >TR LOOP USED 0.029560 SEC TO XLATE X'0006' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.050431 SEC TO XLATE X'0005' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057365 SEC TO XLATE X'0005' BYTES 1,000,000 TIMES >TR LOOP USED 0.029547 SEC TO XLATE X'0005' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.043480 SEC TO XLATE X'0004' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057370 SEC TO XLATE X'0004' BYTES 1,000,000 TIMES >TR LOOP USED 0.029558 SEC TO XLATE X'0004' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.036505 SEC TO XLATE X'0003' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057389 SEC TO XLATE X'0003' BYTES 1,000,000 TIMES >TR LOOP USED 0.029558 SEC TO XLATE X'0003' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.029555 SEC TO XLATE X'0002' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057363 SEC TO XLATE X'0002' BYTES 1,000,000 TIMES >TR LOOP USED 0.029559 SEC TO XLATE X'0002' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.022616 SEC TO XLATE X'0001' BYTES 1,000,000 TIMES >TRE LOOP USED 0.057450 SEC TO XLATE X'0001' BYTES 1,000,000 TIMES >TR LOOP USED 0.029030 SEC TO XLATE X'0001' BYTES 1,000,000 TIMES > >NO TR(E) USED 0.008419 SEC TO XLATE X'0000' BYTES 1,000,000 TIMES >TRE LOOP USED 0.016833 SEC TO XLATE X'0000' BYTES 1,000,000 TIMES >TR LOOP USED 0.005217 SEC TO XLATE X'0000' BYTES 1,000,000 TIMES > >As you can see, for as little as 6 characters (on different runs >I made it was up to as little as 19 characters, so the breakpoint >is anywhere between 5 and 24, I suspect), the non-TR[E] (that is, >the byte-by-byte) version runs faster than the TRE code. But for >more than 256 characters the TRE loop certainly run a lot faster >than the byte-by-byte version! As most bright folks here should >naturally expect, however, nothing beats an ordinary TR loop, in >the same manner that an ordinary MVC loop usually beats an MVCL, >except for a very large number of bytes. > >If anybody wants the entire program so that they can run it on >their machine, or change the data to be translated that I used, >please let me know. It does not depend on any macros other than >what is in SYS1.MACLIB. > ================================================== Art Celestini Celestini Development Services Phone: 201-670-1674 Wyckoff, NJ ============= http://celestini.com ============= Mail sent to the "From" address used in this post will be rejected by our server. Please send off- list email to: ibmmain<at-sign>celestini<dot>com. ================================================== ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html