> What Bill has just described is known as an "Amdahl-balanced system", > and is the design philosophy between the IBM Blue Genes and also > SiCortex. In my opinion, this is the future of HPC. Use lower power, > slower processors, and then try to improve network performance to reduce > the cost of scaling out.
"small pieces tightly connected", maybe. these machines offer very nice power-performance for those applications that can scale efficiently to say, tens of thousands of cores. (one rack of BGQ is 32k cores.) we sometimes talk about "embarassingly parallel" - meaning a workload with significant per-core computation requiring almost no communication. but if you have an app that scales to 50k cores, you must have a very, very small serial portion (Amdahl's law wise). obviously, they put that 5d torus in a BGQ for a reason, not just to permit fast launch of EP jobs. I don't think either Gb or IB are a good match for the many/little approach being discussed. SiCortex was pretty focused on providing an appropriate network, though the buying public didn't seem to appreciate the nuance. IB doesn't seem like a great match for many/little: a lot of cores will have to share an interface to amortize the cost. do you provide a separate intra-node fabric, or rely on cache-coherece within a node? Gb is obviously a lot cheaper, but at least as normally operated is a non-starter latency-wise. (and it's important to realize that latency becomes even more important as you scale up the node count, giving each less work to do...) > Essentially, you want the processors to be > *just* fast enough to keep ahead of the networking and memory, but no > faster to optimize energy savings. interconnect is the sticking point. I strongly suspect that memory is going to become a non-issue. shock! from where I sit, memory-per-core has been fairly stable for years now (for convenience, let's say 1GB/core), and I really think dram is going to get stacked or package-integrated very soon. suppose your building block is 4 fast cores, 256 "SIMT" gpu-like cores, and 4GB very wide dram? if you dedicated all your pins to power and links to 4 neighbors, your basic board design could just tile a bunch of these. say 8x8 chips on a 1U system. > The Blue Genes do this incredibly well, so did SiCortex, and Seamicro > appears to be doing this really well, too, based on all the press > they've been getting. has anyone seen anything useful/concrete about the next-gen system interconect fabrics everyone is working on? latency, bandwidth, message-throughput, topology? > With the DARPA Exascale report saying we can't get > to Exascale with current power consumption profiles, you can bet this > will be a hot area of research over the next few years. as heretical as it sounds, I have to ask: where is the need for exaflop? I'm a bit skeptical about the import of the extreme high end of HPC - or to but it another way, I think much of the real action is in jobs that are only a few teraflops in size. that's O(1000) cores, but you'd size a cluster in the 10-100 Tf range... _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
