AFAIK, there is no reason to expect that the execution of a 64-bit instruction takes any longer than the execution of an equivalent 32-bit instruction. For example, the execution of the 32-bit ADD (AR) instruction should be comparable to the execution of the 64 bit ADD (AGR).
However, the astute historian will have noticed that like many of the original S/360 instructions, AR & friends are 2-byte instructions, and the vast majority of 2-byters were assigned in the original S/360 architecture. Similarly, the original 32-bit RS- and RX-format instructions are 4 bytes long, whereas their newer 64-bit equivalents are 6 bytes. Since most of the newer 64-bit instructions are 4- or 6-byters, and there is the second-order effect of burning up space in the instruction cache for the longer instructions. Likewise, there is a second-order effect of larger storage operands burning up space in the data cache ... which has the potential of slightly slowing the overall execution speed. The potential delay to which Gary refers – that of needing more DAT tables – should be applicable only when the size of the address space exceeds 2 G-bytes. The architecture doesn't require region-3rd, 2nd-, or 1st translation tables when they're not needed, and no sane OS will build these tables unless needed. Even if you're using a huge address space, the TLB should alleviate any delay after the first translation of an address. Furthermore, modern DAT provides for so-called "large pages" (an architectural inaccuracy), where the page- and segment-table entries are omitted from the TLB. Back at SHARE 113 in Denver (2009), I did a presentation on how you can effectively time sequence of instructions. Basically: 1. Build a series of a few hundred ops 2. Run them once to prime the I-cache and D-cache 3.. Note the start time 4. Loop through them a few thousand times 5. Note the stop time. 6. Divide the total time by the number of instructions 7. Fudge out the loop overhead, and you've got a useful SWAG for comparison with other sequences.
