Gerhard Adam wrote: > What are you losing? It isn't as if these processors are off playing > solitaire. They're paying the cost of communication to allow more > simultaneous operations for YOUR workload. The primary benefit of this > approach is to reduce the queueing impacts of multiple units of work > competing for a finite resource. If you don't think this is a > reasonable exchange, there is nothing prohibiting you from running your > workload on a series of uniprocessors that fully exploit their "MIPS" > rating. > > This issue of "losing" resources is a false one. The implication is > that somehow this is being down on purpose. The resources aren't lost, > but rather redirected to accommodate the increased complexity of the > system. There is virtually nothing I can think of that scales upwards > without a loss of either efficiency, cost, or complexity.
couple previous postings in this thread http://www.garlic.com/~lynn/2006l.html#30 One or two CPUs - the pros and cons http://www.garlic.com/~lynn/2006l.html#41 One or two CPUs - the pros and cons minor topic drift, for a long time the corner stone of SMP operation was compare-and-swap instruction. at the science center http://www.garlic.com/~lynn/subtopic.html#545tech charlie had been working on SMP efficiency and fine-grain locking with CP67 on the 360/67. He invented the compare-and-swap instruction (mnemonic chose because CAS are charlie's initials). the first couple trips to POK trying to get compare-and-swap into the 370 architecture were not succesful. we were told that the mainstream POK operating systems didn't care about CAS ... that they could perfectly well get by with TS (test-and-set). In order to get CAS included in the 370 architecture ... a non-SMP application for CAS would have to be created. Thus was born the descriptions about how to use various flavors of CAS in enabled, multi-threaded application code (whether running on single process or SMP, multiprocessor configurations). The original descriptions were part of the instruction programming notes ... but in later principle of operations were moved to the appendix. misc. past posts on smp, compare-and-swap, scale-up, etc http://www.garlic.com/~lynn/subtopic.html#smp tightly-coupled tends to assume extremely fine grain communication and the coordination overhead reflects. loosely-coupled tends to have much courser grained coordination. given that your workload can accomodate courser grained coordination ... a few 20-processor complexes in a loosely-coupled environment ... may, in fact, provide overall better thruput than a single 60 processor operation (where the incremental benefit of each additional processor may be getting close to 1/3rd of a single processor by the time you hit 32 processor configuration). we saw that in the late 80s when we got involved in both fiber channel standard effort as well as the scalable coherent interface standard effort. FCS was obviously a loosely-coupled technology ... which we worked on when we were doing ha/cmp http://www.garlic.com/~lynn/subtopic.html#hacmp also minor reference here http://www.garlic.com/~lynn/95.html#13 One of the engineers in austin had taken some old fiber optic communication technology that had been laying around POK since the 70s (eventually announced as escon on mainframes) and did various tweaks to it ... got it running about ten percent faster effective thruput, and adapted some optical drivers from the cdrom market segment that were less than 1/10th the cost of the drivers that had been defined in POK. This was adapted for full-duplex operation (simultaneously full bandwidth transmission in both directions) and released as SLA (serial link adapter) for rs/6000. Almost immediately he wanted to start on a proprietary version of it that would run 800mbits (simultaneously in both directions). Since we had been working with the FCS standards operation, we lobbied long and hard to drop any idea of doing a propriety definition and instead work on the FCS standard (1gbit, full-duplex, simultaneously in both direction). Eventually he agreed and went on to become the editor of the FCS standards document. SCI could be used in purely tightly-coupled operation ... but it had a number of characteristics which also could be used to approximate loosely-coupled ... and then there were the things in-between ... for NUMA (aka non-uniform) memory architecture. SCI could operate as if it was memory references ... but provide a variety of different performance characteristics (somewhat analogous to old 360 LCS ... where some configurations used it as extension of memory for standard execution and other configurations used it like electronic disk .... more akin to 3090 extended store). sequent and dg took standard four intel processor shared memory boards ... and configured them on the 64-port SCI memory interface for a total of 256 processors that could operate as a shared memory multiprocessor. convex took two HP processor shared memory boards ... and configured them on the 64-port SCI memory interface for a total of 128 processors that could operate as a shared memory multiprocessor. while background chatter for sci is very low ... actually having a lot of different processors hitting the same location constantly can degrade much faster than more traditional uniform memory architecture. at some point the trade-off can cross. so partitioning can be good ... convex took and adapted MACH for the exemplar. one of the things they could do to cut down fine grain coordination scale-up issues is partition the exemplar into possibly 5-6 twenty processor shared memory multiprocessor ... then they could simulate loosely-coupled communication between the different complexes using synchronous memory copies. this was partially a hardware scale-up issue ... scaling shared kernel that was constantly hitting same memory locations from a large number of different real processors ... and partially using partitioning to manage complexity growth. this is somewhat like LPARs are used to partition to manage complexity of different operations that may possibly have somewhat different goals ... which would be a lot more difficult using a single system operation. for other historical topic drift ... MACH was picked up from CMU ... someplace that andrew file system, andrew windows & widgets, camelot, etc had come out of. In this period there was Project Athena at MIT ... jointly funded by DEC and IBM to the tune of $25m each (from which came Kerberos, X, and some number of other things). While IBM funded CMU to the tune of $50m. Mach was also picked up at the basis for NeXT and later for apple operating system (among others). LANL somewhat sponsored/pushed HiPPI thru standards organization (as standard of Cray's copper parallel channel). LLNL somewhat sponsored/pushed FCS thru standards organization as a fiber version of a serial copper connectivity that they had deployed. And SLAC somewhat sponsored/pushed SCI thru the standards process. misc. old posts mentioning HiPPI, FCS, and/or SCI http://www.garlic.com/~lynn/2001b.html#85 what makes a cpu fast http://www.garlic.com/~lynn/2002j.html#45 M$ SMP and old time IBM's LCMP http://www.garlic.com/~lynn/2003.html#6 vax6k.openecs.org rebirth http://www.garlic.com/~lynn/2004e.html#2 Expanded Storage http://www.garlic.com/~lynn/2005e.html#12 Device and channel http://www.garlic.com/~lynn/2005f.html#18 Is Supercomputing Possible? http://www.garlic.com/~lynn/2005h.html#13 Today's mainframe--anything to new? http://www.garlic.com/~lynn/2005j.html#13 Performance and Capacity Planning http://www.garlic.com/~lynn/2005m.html#55 54 Processors? http://www.garlic.com/~lynn/2005n.html#6 Cache coherency protocols: Write-update versus write-invalidate http://www.garlic.com/~lynn/2005v.html#0 DMV systems? ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

