Re: Long translate (TR) instruction?

Edward Jaffe Tue, 25 Mar 2008 19:12:00 -0700

Art Celestini wrote:

It seems that the TRE instruction has been in z/Arch for at least a few
years.  If anyone is inclined to try this, it would be interesting to see
how it fares against Ed Jaffe's code:


      XR   R1,R1             Clear for insert
      L    R15,Length        Load string length
Loop  IC   R1,Input-1(R15)   Get input byte
      IC   R0,XlatTab(R1)    Get translated character ...
      STC  R0,Output-1(R15)  ... and store it in output
      BCT  R15,Loop          Decrement length & loop until done

I believe the OP said that the data to be translated had to first be moved
from one buffer to another.  The above does that, but a move of some type
needs to be added to Ed's code to make it a true comparison.

Some years ago, on our z800 processor, we measured the performance of(in-place) TR against a software-coded loop. We found that the loop wasfaster than TR for strings shorter than nine (9) bytes in length. Whenwe spoke to IBM about this, we learned that TR had been partially movedinto millicode for the z900/z800. It ran slower for short stringsbecause of the millicode start/stop (aka "subroutine linkage") costs.For strings longer than nine bytes, TR was faster because it had accessto a hardware facility that could translate two bytes per cycle. Thecode fragments we compared were:


     |CASE1    DC    0H
     |         LA    R2,9
     |         LA    R3,DATA
     |         XR    R4,R4
     |CASE1L1  DS    0H
     |         IC    R4,0(,R3)
     |         IC    R4,EBCDIC(R4)
     |         STC   R4,0(,R3)
     |         AHI   R4,1
     |         AHI   R3,1
     |         JCT   R2,CASE1L1
     |CASE1L   EQU   *-CASE1


     |CASE2    DC    0H
     |         TR    DATA(9),EBCDIC
     |CASE2L   EQU   *-CASE2

We later "unrolled" the loop, interleaving the use of three differentregisters, and found it was now faster than TR for strings of 24 bytesor fewer!


     |Stride   EQU   3
     |CASE1    DC    0H
     |         LA    R0,9/Stride
     |         LA    R3,DATA
     |         XR    R4,R4
     |         XR    R5,R5
     |         XR    R6,R6
     |CASE1L1  DS    0H
     |         IC    R4,0(,R3)
     |         IC    R5,1(,R3)
     |         IC    R6,2(,R3)
     |         IC    R4,EBCDIC(R4)
     |         IC    R5,EBCDIC(R5)
     |         IC    R6,EBCDIC(R6)
     |         STC   R4,0(,R3)
     |         STC   R5,1(,R3)
     |         STC   R6,2(,R3)
     |         AHI   R3,Stride
     |         JCT   R0,CASE1L1
     |CASE1L   EQU   *-CASE1

The results of the above experiments suggest that your loop has anexcellent chance of being faster than *any* sequence involving TR orTRE, for strings shorter than some number of bytes 'n', on any givenhardware generation supporting z/Architecture.


--
Edward E Jaffe
Phoenix Software International, Inc
5200 W Century Blvd, Suite 800
Los Angeles, CA 90045
310-338-0400 x318
[EMAIL PROTECTED]
http://www.phoenixsoftware.com/

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Long translate (TR) instruction?

Reply via email to