Gerhard Adam wrote:
> What are you losing?  It isn't as if these processors are off playing
> solitaire.  They're paying the cost of communication to allow more
> simultaneous operations for YOUR workload.  The primary benefit of this
> approach is to reduce the queueing impacts of multiple units of work
> competing for a finite resource.  If you don't think this is a
> reasonable exchange, there is nothing prohibiting you from running your
> workload on a series of uniprocessors that fully exploit their "MIPS"
> rating.
> 
> This issue of "losing" resources is a false one.  The implication is
> that somehow this is being down on purpose.  The resources aren't lost,
> but rather redirected to accommodate the increased complexity of the
> system. There is virtually nothing I can think of that scales upwards
> without a loss of either efficiency, cost, or complexity.  

couple previous postings in this thread
http://www.garlic.com/~lynn/2006l.html#30 One or two CPUs - the pros and
cons
http://www.garlic.com/~lynn/2006l.html#41 One or two CPUs - the pros and
cons

minor topic drift, for a long time the corner stone of SMP operation was
compare-and-swap instruction. at the science center
http://www.garlic.com/~lynn/subtopic.html#545tech

charlie had been working on SMP efficiency and fine-grain locking with
CP67 on the 360/67. He invented the compare-and-swap instruction
(mnemonic chose because CAS are charlie's initials). the first couple
trips to POK trying to get compare-and-swap into the 370 architecture
were not succesful. we were told that the mainstream POK operating
systems didn't care about CAS ... that they could perfectly well get by
with TS (test-and-set). In order to get CAS included in the 370
architecture ... a non-SMP application for CAS would have to be created.
Thus was born the descriptions about how to use various flavors of CAS
in enabled, multi-threaded application code (whether running on single
process or SMP, multiprocessor configurations). The original
descriptions were part of the instruction programming notes ... but in
later principle of operations were moved to the appendix. misc. past
posts on smp, compare-and-swap, scale-up, etc
http://www.garlic.com/~lynn/subtopic.html#smp

tightly-coupled tends to assume extremely fine grain communication and
the coordination overhead reflects. loosely-coupled tends to have much
courser grained coordination. given that your workload can accomodate
courser grained coordination ... a few 20-processor complexes in a
loosely-coupled environment ... may, in fact, provide overall better
thruput than a single 60 processor operation (where the incremental
benefit of each additional processor may be getting close to 1/3rd of a
single processor by the time you hit 32 processor configuration).

we saw that in the late 80s when we got involved in both fiber channel
standard effort as well as the scalable coherent interface standard effort.

FCS was obviously a loosely-coupled technology ... which we worked on
when we were doing ha/cmp
http://www.garlic.com/~lynn/subtopic.html#hacmp

also minor reference here
http://www.garlic.com/~lynn/95.html#13

One of the engineers in austin had taken some old fiber optic
communication technology that had been laying around POK since the 70s
(eventually announced as escon on mainframes) and did various tweaks to
it ... got it running about ten percent faster effective thruput, and
adapted some optical drivers from the cdrom market segment that were
less than 1/10th the cost of the drivers that had been defined in POK.
This was adapted for full-duplex operation (simultaneously full
bandwidth transmission in both directions) and released as SLA (serial
link adapter) for rs/6000. Almost immediately he wanted to start on a
proprietary version of it that would run 800mbits (simultaneously in
both directions). Since we had been working with the FCS standards
operation, we lobbied long and hard to drop any idea of doing a
propriety definition and instead work on the FCS standard (1gbit,
full-duplex, simultaneously in both direction). Eventually he agreed and
went on to become the editor of the FCS standards document.

SCI could be used in purely tightly-coupled operation ... but it had a
number of characteristics which also could be used to approximate
loosely-coupled ... and then there were the things in-between ... for
NUMA (aka non-uniform) memory architecture.

SCI could operate as if it was memory references ... but provide a
variety of different performance characteristics (somewhat analogous to
old 360 LCS ... where some configurations used it as extension of memory
for standard execution and other configurations used it like electronic
disk .... more akin to 3090 extended store).

sequent and dg took standard four intel processor shared memory boards
... and configured them on the 64-port SCI memory interface for a total
of 256 processors that could operate as a shared memory multiprocessor.

convex took two HP processor shared memory boards ... and configured
them on the 64-port SCI memory interface for a total of 128 processors
that could operate as a shared memory multiprocessor.

while background chatter for sci is very low ... actually having a lot
of different processors hitting the same location constantly can degrade
much faster than more traditional uniform memory architecture. at some
point the trade-off can cross.

so partitioning can be good ... convex took and adapted MACH for the
exemplar. one of the things they could do to cut down fine grain
coordination scale-up issues is partition the exemplar into possibly 5-6
twenty processor shared memory multiprocessor ... then they could
simulate loosely-coupled communication between the different complexes
using synchronous memory copies.

this was partially a hardware scale-up issue ... scaling shared kernel
that was constantly hitting same memory locations from a large number of
different real processors ... and partially using partitioning to manage
complexity growth. this is somewhat like LPARs are used to partition to
manage complexity of different operations that may possibly have
somewhat different goals ... which would be a lot more difficult using a
single system operation.

for other historical topic drift ... MACH was picked up from CMU ...
someplace that andrew file system, andrew windows & widgets, camelot,
etc had come out of. In this period there was Project Athena at MIT ...
jointly funded by DEC and IBM to the tune of $25m each (from which came
Kerberos, X, and some number of other things). While IBM funded CMU to
the tune of $50m. Mach was also picked up at the basis for NeXT and
later for apple operating system (among others).

LANL somewhat sponsored/pushed HiPPI thru standards organization (as
standard of Cray's copper parallel channel). LLNL somewhat
sponsored/pushed FCS thru standards organization as a fiber version of a
serial copper connectivity that they had deployed. And SLAC somewhat
sponsored/pushed SCI thru the standards process.

misc. old posts mentioning HiPPI, FCS, and/or SCI
http://www.garlic.com/~lynn/2001b.html#85 what makes a cpu fast
http://www.garlic.com/~lynn/2002j.html#45 M$ SMP and old time IBM's LCMP
http://www.garlic.com/~lynn/2003.html#6 vax6k.openecs.org rebirth
http://www.garlic.com/~lynn/2004e.html#2 Expanded Storage
http://www.garlic.com/~lynn/2005e.html#12 Device and channel
http://www.garlic.com/~lynn/2005f.html#18 Is Supercomputing Possible?
http://www.garlic.com/~lynn/2005h.html#13 Today's mainframe--anything to
new?
http://www.garlic.com/~lynn/2005j.html#13 Performance and Capacity Planning
http://www.garlic.com/~lynn/2005m.html#55 54 Processors?
http://www.garlic.com/~lynn/2005n.html#6 Cache coherency protocols:
Write-update versus write-invalidate
http://www.garlic.com/~lynn/2005v.html#0 DMV systems?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to