shmuel+...@patriot.net (Shmuel Metz  , Seymour J.) writes:
> See <http://en.wikipedia.org/wiki/Asymmetric_multiprocessing> and

360/65MP (&370MP) had (symmetric) shared memory but had dedicated
channels on both processors. fully symmetric was simulated by having
twin-tail controllers configured on channels on the different processors
at the same addresses.

os/360 (360/65MP) multiprocessor support had global spin lock ... using
test&set instruction (only one processor serialized executing in large
sections of code).

360/67 multiprocessor was designed for up to 4-processors ... mostly
2-processors and i only know of one 3-processor built (no 4-processors
that i know of).  it had "channel controller" where all processors could
access all channels (but still could be configured partition as
indepedent single processors with dedicated channels). 370/67 also had a
different "multi-ported" memory bus ... a single-processor "half-duplex"
had degraded memory bus cycle time compared to 360/67UP and 360/65UP&MP,
but under heavy i/o load, it would have higher throughput because of
reduced memory contention between processors and i/o).

tss/360 multiprocessor had finer-grain lock serialization (than os/360)
with multiple processors could execute in multiple parts of the kernel
simultaneously.

When charlie was working on cp67 really fine-grain kernel locking at the
science center, he invented compare&swap instruction (chosen because CAS
are charlie's initials). Initital attempts to get compare&swap included
in 370 was rebuffed because the os/360 folks said that test&set was more
than adequate. The 370 architecture owners said that in order to get
compare&swap included in 370, it would be necessary to come up with
non-multiprocessor uses for the instruction. Thus was born the examples
in principles of operation (still there) of how to use compare&swap by
application code (can be used regardless of running on single processor
or multiprocessor). past posts mentioning SMP and/or compare&swap
instruction
http://www.garlic.com/~lynn/subtopic.html#smp

370 multiprocessor (2-processor) continued the 360/65mp design with
dedicated channels on both processors (simulating symmetric).

370 "attached processor" (AP) was less expensive multiprocessor with
only channels available on one of the processors. The 370/168 "attached
processor" cost reduced by only having 168 external dedicated hardware
channels on one of the processors.

The 370/158 "attached processor" was something of fiction ... since the
158 had integrated channels ... the same 158 hardware engine was shared
between executing 370 microcode and executing the channel microcode.  The
only thing saved in the "attached processor" was not having connected
channel cables.

During the 370 period in the first half of the 70s, there was the future
system effort which was going to completely replace 370; during this
period 370 efforts were being suspended/killed ...  which is credited
with giving clone processor vendors market foothold. When FS was killed,
there was mad rush to get products back into the 370 pipeline ...
kicking off 303x in parallel with 3081. past posts mentioning Future
system
http://www.garlic.com/~lynn/submain.html#futuresys

a 370/158 engine was taken w/o the 370 microcode and just the
integrated channel microcode and turned into the 303x "channel director"
(supported six channels just like 370/158 integrated channels).

the 3031 then became the 370/158 engine with just 370 microcode and a
second 370/158 engine with just the channel microcode. A 3031MP then
became four 370/158 engines (two 370/158 engines with 370 microcode and
two 370/158 engines with integrated microcode). A 3031AP was three
370/158 engines (two engines with 370 microcode and one engine with
channel microcode).

a 3032 was a 370/168 reconfigured to use one or more 303x channel
directors (in place of the 168 external hardware channel boxes).  a
3032MP was two 370/168s, each having one or more 303x channel directors.
A 3032AP was then two 370/168s, only one having 303x channel directors
(aka 370/158 engines with integrated channel microcode).

a 3033 started out as 168-3 logic mapped to 20% faster chips. Some
redesign got 3033 up to 50% faster than 168-3 (aka 4.5mips compared to
3mips). A 3033MP was both processors with connected 303x channel
directors. A 3033AP was two processor with only one having connected
channel directors.

370 has very strong memory consistency ... and to support multiprocessor
cache strong memory consistency ... two processor configuration slowed
the the processor machine cycle down by 10% (giving cache processing
extra cycles to handle cross-cache invalidation signals) ... a base
2-processor 370 starts out as 1.8 times a single processor (any actual
cross-cache invalidation signals would slow things down even further).

370 two-processor throughput was even further slowed down by operating
system multiprocessor overhead and lock spinning contention ... so
recommendations typically was 2-processor had 1.4-1.5 times the
throughput of single processor.

However, if you had very fine grain locking and super lightweight
multiprocessor pathlengths ... it could get throughput close to the
hardware limit of 1.8times single processor. HONE system (internal,
world-wide vm370-based online sales & marketing support system) got an
early 370/158 multiprocessor ... and i played some tricks with implicit
cache-affinity ... so i got twice throughput of single processor 370/158
(1.8factor because of 10% reduction in machine cycle time was offset by
increase in cache hit rate with the slight-of-hand with caches in
two-processor operation).

Conversely, 3081 was designed to be multiprocessor only ... so the
implicit machine cycle slowdown was baseline. However, TPF (renamed
airline control program) didn't have multiprocessor support. As a result
there was danger of the TPF customers migrating to clone vendors
offering faster single processor. Initially, there was some very
unnatural things to vm370 multiprocessor support that attempted to get
the two 3081 processors doing work concurrently for the virtual machine
single processor running TPF. This slightly improved TPF throughput
under vm370 ... while significantly degrading throughput for all the
other vm370 mutliprocessor customers.

Eventually a 3083 single processor was announced. One of the big issues
was the simplest solution was remove processor two from 3081 ... but
that was in the middle of the box ... leaving 3083 dangerously
top-heavy. Some actual re-engineering had to be done to move processor
one from the top of the box to the middle.  3083 processor was announced
at almost 15% faster than 3081 processor (the elimination of the 10%
cache consistency multiprocessor cycle slow-down).

for something completely different in symmetric multiprocessor.
http://www.garlic.com/~lynn/2007.html#46 
with this email about VAX VMS
adding symmetric multiprocessor in VMS release 5
http://www.garlic.com/~lynn/2007.html#email880324 
and
http://www.garlic.com/~lynn/2007.html#email880329

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to