On Mon, 15 Aug 2011 09:43:37 -0500, Blaicher, Chris wrote:
>L1.5 is about 8 times slower than L1, if memory serves me correctly.
There is essentially no delay in fetching from L1 cache. The 1 cycle that
it takes is included in the execution cycle. The L1.5 cache delay is about
13 cycles. The L2 cache delay is 90-230 cycles, depending on if the L2
cache is local or remote. These times are for the z10.
>The other thing to consider is the super scalier nature of a z10 and z196.
The machine wants to process 2 or 3 instructions in the same cycle if they
are not dependent on each other. If instruction #2 is dependent on
instruction #1, then #2 gets held until #1 completes. It gets even more
complicated. If #2 is just using in a register what #1 produced there is a
delay of 3 or 4 cycles, but if #2 is using what #1 produced as part of an
address, the delay can be like 8 or 9 cycles.
>
>Examples
> L R1,data
> AHI R1,1 delay 3/4 cycles
On the z10, the delay is 5 cycles.
> LA R1,structure
> L R2,20(,R1) delay 8/9 cycles
There is a special "LA Bypass". The delay is only 4 cycles on a z10.
For cases other than L and LA the delay is longer:
A R1,=F'1'
MHI R1,6 delay 6 cycles
The z196/z114 is very different from the z10 and z9. Some instructions like
A are split into two parts (L, AR) with the L target and AR source being a
temporary internal register. The parts can be scheduled separately. The
z196 /z114 also has an "Out Of Order" instruction scheduler. Only the Put
Away stage of the pipeline is executed in-order.
David Bond