Re: Long translate (TR) instruction?

Art Celestini Thu, 27 Mar 2008 21:24:42 -0700

William:

Thanks (again).  I found these results most interesting.


Art


At 11:18 PM 3/27/2008, William H. Blair wrote:
  
>Art Celestini wrote:
>
>> I'm convinced that TRE and TR are faster but it seems that 
>> a truly "fair" comparison of solutions to the stated problem 
>> should have included equivalent moves in the TRE and TR 
>> solutions. 
>
>I did write and run versions with the code like that. And, I 
>said so:
>
>|                   I've got code that needs to translate stuff
>| in a buffer and it does not need it moved. And I have other
>| code that first moves it and then translates it, because it
>| doesn't want to clobber what it's translating. But, I did it
>| both ways, just to find out for sure if it made a difference.
>| It does not.
>
>But since you asked, I added those into the mix, so you can 
>see and judge for yourself:
>
>         TIME (IN SECONDS) FOR 001,000,000 REPETITIONS OF:
>         -------------------------------------------------        
>--BYTES-  NO TR(E)  TRE INPL  TRE MVC   TR  INPL  TR  MVC  
>======== --------- --------- --------- --------- --------- 
>00000800 14.939655  1.245189  1.642310  1.082476  1.236875 
>00000400  7.162529  0.731567  0.971124  0.487941  0.580783 
>00000200  3.593004  0.461754  0.673962  0.206117  0.268123 
>00000100  1.802772  0.253433  0.342846  0.032038  0.050725 
>000000C0  1.355390  0.240958  0.311724  0.031969  0.048488 
>00000080  0.909253  0.210573  0.276103  0.031942  0.046119 
>00000040  0.463195  0.150320  0.164585  0.032047  0.043604 
>00000020  0.238923  0.101492  0.113927  0.032032  0.042417 
>0000001E  0.225827  0.111231  0.122245  0.032019  0.042544 
>0000001C  0.210944  0.110432  0.122021  0.031966  0.042432 
>0000001A  0.197080  0.110823  0.122119  0.031953  0.042508 
>00000018  0.183400  0.104318  0.116599  0.031982  0.042673 
>00000016  0.169207  0.099349  0.110853  0.031980  0.042465 
>00000014  0.155477  0.100393  0.109962  0.032081  0.042704 
>00000012  0.141733  0.099860  0.111362  0.031961  0.042495 
>00000010  0.127308  0.070471  0.083389  0.031962  0.041866 
>0000000E  0.113336  0.074843  0.086993  0.031981  0.041867 
>0000000C  0.099318  0.073958  0.086677  0.031962  0.041833 
>0000000A  0.085462  0.074848  0.086733  0.032057  0.041985 
>00000008  0.071609  0.069932  0.081476  0.030228  0.038990 
>00000007  0.064623  0.058755  0.068647  0.030245  0.039025 
>00000006  0.057541  0.058729  0.068720  0.030278  0.038971 
>00000005  0.050582  0.058701  0.068568  0.030230  0.038931 
>00000004  0.043603  0.058764  0.068620  0.030246  0.039029 
>00000003  0.036664  0.058748  0.068683  0.030220  0.038934 
>00000002  0.029665  0.058824  0.068732  0.030386  0.039100 
>00000001  0.022716  0.059113  0.069109  0.029829  0.038662 
>00000000  0.005250  0.016894  0.005825  0.005239  0.005835 
>
>TESTNAME  DESCRIPTION
>--------  --------------------------------------------
>NO TR(E)  Basic move and translate, one byte at a time
>TRE INPL  TRE loop in-place
>TRE MVC   TRE loop buffer-to-buffer move first
>TR  INPL  TR  loop in-place
>TR  MVC   TR  loop buffer-to-buffer move first  
>
>TR is always faster than TRE. Having to move the data
>from an input buffer to a separate output buffer for
>translation increases the CPU time required by ~15%.
>
>That is still way less than the overhead of the basic
>move and translate, which is the fastest technique 
>only for 0, 1, 2, or 3 bytes (for more than 3 bytes,
>the basic TR loop, or even the TR loop with the data
>to be translated having to be moved to the output 
>buffer first, is fastest).
>
>The above figures include the "equivalent moves" to make 
>it a 'truly "fair" comparison of solutions to the stated 
>problem'. It reflects what I have always observed about
>such tests: a well-coded, basic, tight MVC loop (or an
>MVCL) is pretty fast compared to almost anything else
>that involves a half-dozen or so instructions that do
>virtually anything. Thus, counting the CPU time that is
>required to move the data to a separate buffer as part
>of the overhead doesn't actually add that much to the
>CPU time required to get the whole job done. 
>
>I suspect that this is simply due to the fact that MVC
>and MVCL are already pretty well-optimized for the job
>they do. Even a basic, tight loop will be limited by 
>some performance constraint, probably by the rate at
>which instructions whose execution cannot be overlapped
>can be pumped through the machine (in contrast to blobs 
>of data MVCing and TR[T]ing thru the wires all as part 
>of one instruction).
>
>Today, for all intents and purposes, the time required 
>to execute any given standard instruction is the same 
>as any other. This is because the work to be done can be
>done in the available time, before another instruction
>is fetched and shoved through the internal machinery.
>The instructions which process more than a word or two 
>of data take longer, of course. But some of those are
>very highly optimized (in hardware -- for example, the
>LM and STM instructions are no longer pigs. They are,
>in fact, fairly effective substitutes for MVC, except
>that you toast the contents of several registers when
>you use enough to make it worthwhile.  
>
>Thus, optimization in our world today, except for the
>scientific and numerical programmer, has become a job
>of simply cutting down the number of instructions one
>executes ... that is, simply cutting the path length.
>
>And that is the simple reason why a basic TR loop,
>even with an MVC/MVCL (just) ahead of it, is still 
>the quickest way to get the job done, because fewer
>instructions overall are executed.
>



==================================================
Art Celestini       Celestini Development Services
Phone: 201-670-1674                    Wyckoff, NJ
=============  http://celestini.com  =============
Mail sent to the "From" address  used in this post
will be rejected by our server.   Please send off-
list email to:  ibmmain<at-sign>celestini<dot>com.
==================================================

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Long translate (TR) instruction?

Reply via email to