From: "Tony Harminc" <[email protected]>
Sent: Saturday, June 16, 2018 3:03 AM
On 14 June 2018 at 15:40, Charles Mills <[email protected]> wrote:
Not the answer to your question but I don't think "TRT performs badly."
It is just that people sometimes assume that because it is a single
instruction in the Pop it must execute roughly as fast as many simple
instructions.
I think it is fast for what it does. Picture "writing TRT in assembler" --
implementing it as a subroutine as if the opcode magically vanished.
Now picture a version of that only somewhat faster because millicode has
some special tricks up its sleeve -- that's TRT.
Yup. A byte-at-a-time subroutine.
It's not a subroutine. The original (on S/360) was about ten times faster
than the equivalent in ordinary character-manipulating instructions.
Ignoring for the moment what Dan says about offloading TRT to special
hardware, there is still a small but fundamental problem with TRT and
TR, as architected in the early 1960s. That is that access failures
*for the table* (operand 2) are dependent on the actual data in
operand 1. If your table straddles a page boundary, it is entirely
ordinary for your access to those two pages to be different, whether
due to key protection, page protection, or just that one is paged in
and the other isn't.
The programmer can usually manage to avoid that.
TRT and TR are defined so as to program check
only if the data requires it, so if no data byte causes a given
function byte to be referenced, there will be no program checks
related to the address of that function byte.
A similar problem arises with the data (operand 1) of TRT: program
checks are recognized only if strictly necessary. If the instruction
ends early because of the value of the function byte addressed by a
data byte, then there's no program check even if unexamined parts of
operand 1 are inaccessible. This is a more common situation not unique
to TRT (presumably SRST and friends would also have it), but still it
must surely be checked for, essentially one byte at a time.
The newer variants TRTE and the many TR variants like TROO, TROT, etc.
are defined so as to allow the program check even if the data doesn't
require looking at one or the other page.
Well obviously don't put your table across a page boundary, and
presumably the hardware/millicode will detect this and be able to run
faster. But is this a reasonable requirement for an application
program to have to pay attention to?
It the programmer who pays attention to such.
It's very easy to have your table
in a good place, insert some data above it that pushes it over a
boundary,
Again, the programmer can handle it.
and suddenly things slow down. Maybe on modern machines all
this state can be cached, but if the data is different each time as it
would be in a list of words, then I can't see a general way of
speeding up the checking. Maybe Dan knows the answer...
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus