Kirk Wolf said:

> I'm looking for the fastest way in assembler to 
> translate data in one buffer to another using a 
> 256-byte translate table.

Want my test program to help you decide? Let me know.
But don't waste your time. I already know the answer.

Look at my TR subroutine in a previous post as a place
to get started (if you need that).

Shmuel Metz (Seymour J.) said:

> The fastest way on one model may not be the fastest 
> way on another model.

True.  But -- I just knew you were expecting a but -- I
have been looking at this off and on for about 8 years,
and have had access to most (if not all) models of zXXX
hardware (currently I have access to a 2094, 2096, 2086
and a 2066). I have NEVER found an instruction sequence
that would run faster than a simple old-fashioned TR[T] 
(or MVC or CLC) loop on ANY z model machine - except an 
MVCL or CLCL for a "very large" number of bytes.  Since
very little code like this is on a performance-critical
path, I mostly just use whatever is convenient; in such
a case it does not really matter. If I believe the code
is on a performance-critical path I'll use a subroutine
that does it the old-fashioned way (TR/TRT/CLC/MVC loop
or whatever), unless I have special knowledge that lots
of bytes (more than 4KB) need to be MVCed/CLCed.  Thus,
if Mr. Wolf currently has a z box (Duh!) I can tell him
that the answer to that question -- TODAY -- is just do
an old-fashioned MVC loop (or an MVCL) to move the data
to the buffer where one will need it after translation, 
and then use an old-fashioned TR loop to actually do it
in that (output) buffer. On any z box that exists today
that is the fastest way. And I bet it stays that way in
the future, probably forever. Why? There is very little
that microcode/millicode can do faster than the current
raw, basic machine can do with these fundamental S/360-
era instructions. The same basic internal operations to
get the job done have to be done in each instance so it
does not matter whether the orders are coming from code
or millicode/microcode. Now, if the machine offered the
TR[T]L instructions, then probably -- just as it is the
case for MVCL and CLCL -- those would run just a little
faster than an old-fashioned basic TR[T] loop, but only
for large numbers of bytes. But we don't have TR[T]L so 
the System/360 instructions are still the fastest way. 

--
WB

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to