shmuel+...@patriot.net (Shmuel Metz , Seymour J.) writes: > See <http://en.wikipedia.org/wiki/Asymmetric_multiprocessing> and
360/65MP (&370MP) had (symmetric) shared memory but had dedicated channels on both processors. fully symmetric was simulated by having twin-tail controllers configured on channels on the different processors at the same addresses. os/360 (360/65MP) multiprocessor support had global spin lock ... using test&set instruction (only one processor serialized executing in large sections of code). 360/67 multiprocessor was designed for up to 4-processors ... mostly 2-processors and i only know of one 3-processor built (no 4-processors that i know of). it had "channel controller" where all processors could access all channels (but still could be configured partition as indepedent single processors with dedicated channels). 370/67 also had a different "multi-ported" memory bus ... a single-processor "half-duplex" had degraded memory bus cycle time compared to 360/67UP and 360/65UP&MP, but under heavy i/o load, it would have higher throughput because of reduced memory contention between processors and i/o). tss/360 multiprocessor had finer-grain lock serialization (than os/360) with multiple processors could execute in multiple parts of the kernel simultaneously. When charlie was working on cp67 really fine-grain kernel locking at the science center, he invented compare&swap instruction (chosen because CAS are charlie's initials). Initital attempts to get compare&swap included in 370 was rebuffed because the os/360 folks said that test&set was more than adequate. The 370 architecture owners said that in order to get compare&swap included in 370, it would be necessary to come up with non-multiprocessor uses for the instruction. Thus was born the examples in principles of operation (still there) of how to use compare&swap by application code (can be used regardless of running on single processor or multiprocessor). past posts mentioning SMP and/or compare&swap instruction http://www.garlic.com/~lynn/subtopic.html#smp 370 multiprocessor (2-processor) continued the 360/65mp design with dedicated channels on both processors (simulating symmetric). 370 "attached processor" (AP) was less expensive multiprocessor with only channels available on one of the processors. The 370/168 "attached processor" cost reduced by only having 168 external dedicated hardware channels on one of the processors. The 370/158 "attached processor" was something of fiction ... since the 158 had integrated channels ... the same 158 hardware engine was shared between executing 370 microcode and executing the channel microcode. The only thing saved in the "attached processor" was not having connected channel cables. During the 370 period in the first half of the 70s, there was the future system effort which was going to completely replace 370; during this period 370 efforts were being suspended/killed ... which is credited with giving clone processor vendors market foothold. When FS was killed, there was mad rush to get products back into the 370 pipeline ... kicking off 303x in parallel with 3081. past posts mentioning Future system http://www.garlic.com/~lynn/submain.html#futuresys a 370/158 engine was taken w/o the 370 microcode and just the integrated channel microcode and turned into the 303x "channel director" (supported six channels just like 370/158 integrated channels). the 3031 then became the 370/158 engine with just 370 microcode and a second 370/158 engine with just the channel microcode. A 3031MP then became four 370/158 engines (two 370/158 engines with 370 microcode and two 370/158 engines with integrated microcode). A 3031AP was three 370/158 engines (two engines with 370 microcode and one engine with channel microcode). a 3032 was a 370/168 reconfigured to use one or more 303x channel directors (in place of the 168 external hardware channel boxes). a 3032MP was two 370/168s, each having one or more 303x channel directors. A 3032AP was then two 370/168s, only one having 303x channel directors (aka 370/158 engines with integrated channel microcode). a 3033 started out as 168-3 logic mapped to 20% faster chips. Some redesign got 3033 up to 50% faster than 168-3 (aka 4.5mips compared to 3mips). A 3033MP was both processors with connected 303x channel directors. A 3033AP was two processor with only one having connected channel directors. 370 has very strong memory consistency ... and to support multiprocessor cache strong memory consistency ... two processor configuration slowed the the processor machine cycle down by 10% (giving cache processing extra cycles to handle cross-cache invalidation signals) ... a base 2-processor 370 starts out as 1.8 times a single processor (any actual cross-cache invalidation signals would slow things down even further). 370 two-processor throughput was even further slowed down by operating system multiprocessor overhead and lock spinning contention ... so recommendations typically was 2-processor had 1.4-1.5 times the throughput of single processor. However, if you had very fine grain locking and super lightweight multiprocessor pathlengths ... it could get throughput close to the hardware limit of 1.8times single processor. HONE system (internal, world-wide vm370-based online sales & marketing support system) got an early 370/158 multiprocessor ... and i played some tricks with implicit cache-affinity ... so i got twice throughput of single processor 370/158 (1.8factor because of 10% reduction in machine cycle time was offset by increase in cache hit rate with the slight-of-hand with caches in two-processor operation). Conversely, 3081 was designed to be multiprocessor only ... so the implicit machine cycle slowdown was baseline. However, TPF (renamed airline control program) didn't have multiprocessor support. As a result there was danger of the TPF customers migrating to clone vendors offering faster single processor. Initially, there was some very unnatural things to vm370 multiprocessor support that attempted to get the two 3081 processors doing work concurrently for the virtual machine single processor running TPF. This slightly improved TPF throughput under vm370 ... while significantly degrading throughput for all the other vm370 mutliprocessor customers. Eventually a 3083 single processor was announced. One of the big issues was the simplest solution was remove processor two from 3081 ... but that was in the middle of the box ... leaving 3083 dangerously top-heavy. Some actual re-engineering had to be done to move processor one from the top of the box to the middle. 3083 processor was announced at almost 15% faster than 3081 processor (the elimination of the 10% cache consistency multiprocessor cycle slow-down). for something completely different in symmetric multiprocessor. http://www.garlic.com/~lynn/2007.html#46 with this email about VAX VMS adding symmetric multiprocessor in VMS release 5 http://www.garlic.com/~lynn/2007.html#email880324 and http://www.garlic.com/~lynn/2007.html#email880329 -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN