> SGI is still substantially faster than Infinipath - at least SGI > talks about sub-1-us latency, and bandwidth well past 2 GB/s...
I didn't look extensively, but: http://www.sgi.com/products/servers/altix/numalink.html Claims 1 us (not sure if that is 1 or 2 significant digits), 3.2 GB/sec per link. A similar page: http://www.pathscale.com/infinipath-perf.html Pathscale claims 1.29us latency, 954MB/sec per link. Of course it's much more complicated than that. I was somewhat surprised at how much of a special case the Altix MPI latency is, I found: http://www.csm.ornl.gov/~dunigan/sgi/altixlat.png Anyone have similar for a current altix (current OS, drivers, and NUMAlink 4)? So 1us latency to one other CPU ;-). With infinipath I often see sub 1.0 us latencies [1]. Until I saw that graph, I thought shared memory + NUMAlink enable 1.0us MPI latency. In reality NUMAlink isn't involved and that's only to the local CPU. Corrections welcome. Unfortunately for those hoping for 1.0us latencies on an Altix real world communications often involve talking to non-local processors. Look at the random ring latency benchmark, Pathscale seems to do about 1/2 the latency of a similar size Altix [2]. For non-trivial communication patterns (i.e. non matched nearest neighbor pairs) it looks like Pathscale might have an advantage. Granted shared memory is big differentiator, then again so is price/performance. Seems like reasonable Opteron + Infinipath clusters are in the neighborhood of $1,200 per core these days. I've not seen a large Altix quote recently, but I was under the impression it was more like 10 times that (when including storage and 3 year warranty). So (no surprise) Opterons + Infinipath and Altix's have different markets and applications that justify their purchases. The best news for the consumer on the CPU side is that AMD has managed to light a fire under Intel. Nothing like lower latency, twice the bandwidth, and less power to scare the hell out of an engineering department. Rumors claim reasonable Opteron competition will be out this summer. On the interconnect side there seems to be much more attention paid to latency, message rates, and random ring latency these days. I'm happy to say that from what I can tell this is actually improving real world scaling on real world applications. So don't mind me while I cheer the leaders while painting a target on them. For the underdogs that got caught with their pants down, you now know where you need to be. I'm happy to say the underdogs seem to be paying attention and the spirit of competition seems to be very healthy. As a result it's a much more aggressive battle for the HPC market and the HPC consumer wins. [1] As long as I'm on the same node. Home grown benchmark: node001 node001 node001 node001 size= 1, 131072 hops, 4 nodes in 0.128 sec ( 0.973 us/hop) 4013 KB/sec [2] at least from the data points at http://icl.cs.utk.edu/hpcc/hpcc_results.cgi > directory-based coherence is fundamental (and not unique to dash followons, > and hopefully destined to wind up in commodity parts.) but I think people > have a rosy memory of a lot of older SMP machines - their fabric seemed > better mainly because the CPUs were so slow. modern fabrics are much > improved, but CPUs are much^2 better. (I'd guess that memory speed improves Seems like just the opposite. Has performance really changed that much since the 2.8 GHz northwood or the 1.4 GHz opteron? Intel's IPC has been dropping ever since. Interconnect wise since then we have mostly gotten what a factor of 4 in latency (6-8us for older Myrinet + older gm on the slower clocked cards vs Infinipath) and 2.5 Gb -> 20 Gb (Myrinet vs IB DDR). If it's really the fabric that is holding you back you could have 2, getting 2 16x pci-e slots in a node isn't hard today. Not saying it would be easily price/performance justified, but it plausibly gives you more performance. To me it looks like the interconnect vendors are anxiously awaiting the next doubling in CPU performance, memory bandwidth, and cores per socket to help justify their existence to a larger part of the market. Seems like it's the CPU that has been the one slow to improve these last years. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
