On Mon, 15 Aug 2011 09:43:37 -0500, Blaicher, Chris wrote:
>L1.5 is about 8 times slower than L1, if memory serves me correctly.

There is essentially no delay in fetching from L1 cache.  The 1 cycle that
it takes is included in the execution cycle.  The L1.5 cache delay is about
13 cycles.  The L2 cache delay is 90-230 cycles, depending on if the L2
cache is local or remote.  These times are for the z10.

>The other thing to consider is the super scalier nature of a z10 and z196.
 The machine wants to process 2 or 3 instructions in the same cycle if they
are not dependent on each other.  If instruction #2 is dependent on
instruction #1, then #2 gets held until #1 completes.  It gets even more
complicated.  If #2 is just using in a register what #1 produced there is a
delay of 3 or 4 cycles, but if #2 is using what #1 produced as part of an
address, the delay can be like 8 or 9 cycles.
>
>Examples
>         L    R1,data
>         AHI  R1,1          delay 3/4 cycles

On the z10, the delay is 5 cycles.

>         LA   R1,structure
>         L    R2,20(,R1)    delay 8/9 cycles

There is a special "LA Bypass".  The delay is only 4 cycles on a z10.

For cases other than L and LA the delay is longer:
        A     R1,=F'1'
        MHI   R1,6       delay 6 cycles

The z196/z114 is very different from the z10 and z9.  Some instructions like
A are split into two parts (L, AR) with the L target and AR source being a
temporary internal register.  The parts can be scheduled separately. The
z196 /z114 also has an "Out Of Order" instruction scheduler.  Only the Put
Away stage of the pipeline is executed in-order.

David Bond

Reply via email to