Someone from IBM sent me a couple of presentations, some of which I had
contributed to.  It covers things like *instruction decode, address
generate,  execute, put away*; parallelism.
How
   L R4,VALUE
   L  R5,0(R4)     this has to wait until the previous load has finished,
but other instructions can be processed in parallel.

Data from the L1 cache is faster than data from a different book.
Dont have 2 threads running concurrently sharing the same cache block for
private data.

All good stuff

Colin

On Wed, 1 Mar 2023 at 17:59, Colin Paice <colinpai...@gmail.com> wrote:

> I've been asked to give a talk on performance to a University Computing
> department.
>
> I know the z hardware has in builtin instrumentation which allows you to
> see where the delays were for a particular instruction.  For example this
> load instruction got data from the L3 cache and it took x nano seconds.
>
> Is there a presentation on this?
>
> I remember seeing a presentation (it may have been IBM confidential)
> showing that a Load could be slow, if the data was in a the cache in a book
> 3 ft away, compared to it being in the cache on the chip.
> Also the second time round a loop is faster than the first time because
> the instructions are in the instruction cache.
>
> This was all mind blowing stuff!
>
> Colin
>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to