> Possibly the existence of hardware that included such an assist. BTDTGTTS.
Hard to envisage the economics of an assist for an instruction whose worst case is a machine cycle and whose best case is pre-cycle recognition. Most BCTRs don't consume a cycle. Their superfluousness is recognised in the pipe. It's quite phenomenal how many instructions are never "executed" in that sense, but are dealt with in the pipe pre-execution. Hard coding instruction loops for ultimate performance is almost always a waste of effort. My butt got burnt many years ago with one of the first "slugged" machines (IBM hates the term "knee-capping", so I use it whenever I can) which was the NAS AS/9040. This was a Hitachi S9 processor sold unknee-capped as the AS/9060. The knee-capping was done simply by adding null cycles to the I-stage of certain frequently-used instructions. One of our wonderful, delightful, charming customers decided to code his own synthetic kernel to measure "MIPS". Most will know my opinion of MIPS. But this guy used a tight loop to execute a selection of "long" instructions - a TRT, a MVCL, something floating-point, etc., and claimed to measure their performance. It just so happened that the instruction following his target instruction was one of the main ones that had the I-cycles added. But it didn't make a difference worth a damn - his target instructions took so long in the E-stage ´that the dummy cycles in the I-stage of the following instruction had expired by its termination. Now comes the fun bit. He paid us $$$$$ for an upgrade to an AS/9060 and his synthetic kernel ran at EXACTLY the same speed. Obvious - we deleted dummy cycles out of an instruction that was waiting for the E-stage in any case. Cue lawyers. His production workloads showed a performance improvement actually a little bit better than we'd promised. But his goddamn synthetic kernel showed no change at all - and he threatened to sue us! And having worked for Morino Associates and having been a founder member of CMG in the UK and a past Vice-Chairman, I have a loathing for synthetic kernels that can be comfortably described as passionate. If not obsessive. Don't do it. You know not with what you mess. IBM knee-capped its machines of that era simply by reducing the HSB. The 3033S famously had a high speed buffer of 512 bytes - around a quarter the capacity of each of the ten-cent 3270s attached to it. IBM proposed such a machine for the Finanzamt Charlottenburg in Berlin, and actually succeeded in writing a benchmark that fitted in 512 bytes, so the thing ran like a 3033U. I'd love to know what happened when that thing met their real workload. Modern knee-capping is more comprehensive. Epilogue: There are always exceptions to every rule. One day when I was out consulting as an ace Assembler programmer, I had an internal client turn up with an interesting problem. It was an in-storage search mechanism against an ordered table. He'd been looking at the table structure and had come to the conclusion that classic binary chop might not be the best way - something heuristic might be better. Leonard da Pisa gave us a clue, and I wrote a really tight RR routine. It cut an hour off the overnight batch run on a /65, but the big surprises came later when the hardware started reassigning GPRs. Holy molly. Things have changed an awful lot, but that routine still runs every night because of some side results it produces. What used to take four hours now takes eight minutes. -- Phil Payne http://www.isham-research.co.uk +44 7833 654 800 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

