Re: Each CPU usage

Anne & Lynn Wheeler Wed, 29 Aug 2007 12:03:13 -0700

[EMAIL PROTECTED] (Thompson, Steve) writes:

Imagine, you have a 3081 at 100% and you upgraded to a 3084 (basically
you added the other 3081) and you are still at 100%. Or you have a 3033
and you went to a 470/V8. [I'm not saying these were the systems, just
using them as examples.]


3081 was two processor ... and 3084 was pair of 3081s (four-processors).

the basic 370/3033/308x multiprocessor cache coherency started out
slowing the processor speed by 10% to allow for cross-cache chatter
(i.e. raw two-processor thruput was 1.8 times raw single processor
thruput). this is independent of any actual cache invalidates that
were occuring (i.e. just providing for basic cross-cache
communication). 3084 was even worse since each processor cache had to
listen for x-cache chatter from three other processors (rather than
just one other processor).

308x wasn't even going to have a single processor version ... however,
eventually a single processor, 3083 did ship. This was primarily
motivated by ACP/TPF which didn't have multiprocessor at the time
(base 3083 processor was almost 15percent faster than one of the
3081 processors since the multiprocessor x-cache chatter slowdown
was eliminated)

in the 3081 time-frame ... both vm and mvs had kernel storage re-org
to carefully align storage on cache-line boundaries (and multiples of
cache lines). this was to eliminate a lot of cache-line "trashing"
where two different storage locations overlapped in the same
cache-line (and different processors could be simultaneously operating
on the two storage locations). This kernel storage re-org was claimed
to improve system thruput by something over five percent.

the other example was a major restructuring of the vm multiprocessor
support between r6 and sp1. the issue was that since acp/tpf didn't
have multiprocessor support, there was a lot of acp/tpf running under
vm370 on 3081s. for the dedicated acp/tpf, 3081 operations that met
that they ran two copies of acp/tpf (in two different virtual
machines) and/or that one of the processors sat idle most of the time.
for the later case, the multiprocessor restructuring attempting to get
(some amount of) virtual machine kernel processing running on the
"idle" processor (overlapped with acp/tpf execution on the other
processor). This involved introducing a lot of signal processor
instructions to wake the possibly idle processor to get busy on some
execution and return to executing the (acp/tpf) virtual machine
(specific scenario was overlapping siof instruction emulation and
channel program translation with the acp/tpf virtual machine
execution).

the standard virtual machine multiprocessor support was designed for
efficiently handling lots of totally operations. the sp1
reorganization (for acp/tpf overlapped execution) was generic for all
possible execution environments ... and introduced quite a bit of
overhead (in the acp/tpf scenario it was justified on the basis that
it improved overall thruput ... since there was an otherwise idle
processor).

a lot of existing customers moving from r6 multiprocessor support to
sp1 multprocessor support found significant increase in multiprocessor
overhead ... a combination of the significant increase in signal
processor instructions, the corresponding interrupts and a lot of new
"spin-lock" activities (just the "new" "spin-locks" measured as much
as ten percent of each processor).

"spin-locks" were typically used to provide exclusive execution for
lots of kernel code. global kernel "spin-locks" were typical of lot of
60s, 70s and even 80s operating systems (i.e. a single kernel lock
that kernel would attempt to obtain at entry into kernel mode
... interrupt routines, etc ... and spin/loop until it obtain the
lock).

at the science center,http://www.garlic.com/~lynn/subtopic.html#545tech


charlie was working on fine-grain multiprocessing kernel locks (lots
of short execution paths rather than the whole kernal) for cp67 when
he invented the compare-and-swap instruction (CAS mnemonic chosen
because they are charlie's initials ... compare-and-swap designation had

to be invented to have something that matched CAS). the attempt to getCAS added to 370 architecture was initially rebuffed ... the favoriteson operating system considered the test&set locking instruction (usedfor os/360 multiprocessor kernel spin-locks) more than sufficient for 370

multiprocessing support. the challenge was to come up with a
non-multiprocessor use for the compare-and-swap instruction ... in
order to get it included in 370 architecture. lots of past posts
mentioning multiprocessor and/or compare-and-swap instruction
http://www.garlic.com/~lynn/subtopic.html#smp

this is where the use for a lot of multithreaded application software
(regardless of whether running on multiprocessor hardware) was
invented .. as well as the programming notes that now appear
in appendix of principles of operation ... i.e.

A.6 Multiprogramming and Multiprocessing Examples
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9ZR003/A.6?SHELF=DZ9ZBK03&DT=20040504121320

up until vm/sp1 ... the primary kernel multiprocessor locking support
was something i started out calling "bounce" lock ... rather than
"spinning" on a lock, i had created a mechanism that queued a very
lightweight execution request (for instance, significantly more
lightweight than SRB) when certain reqeusted locks were in use by some
other processor. i had originally developed the technique for a
multiprocessor project (that didn't ship as a product) where a lot of
the related code had been migrated to microcode. When the microcode
could no longer process the operation (in parallel with other
processors) ... it would queue an interrupt into the kernel to
complete the processing. When this project was killed, i translated
the convention to vanilla multiprocessor 370 machines ... some
past posts mentioning the heavily microcoded multiprocessor effort
http://www.garlic.com/~lynn/subtopic.html#bounce

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Each CPU usage

Reply via email to