On Nov 27, 2005, at 1:07 PM, Eugen Leitl wrote:
Absolutely. Multithreading (SMP) approach doesn't scale. Illusion of a
global memory is a difficult one to maintain in a write-intesive context and won't go beyond 16-64 cores. If you want to have >10^3 parallelism,
you have to go message passing. There are no alternatives to that.


SMP is message passing where the distributed transaction semantics are implemented in low-latency hardware. You can implement the same thing in MPI, you just have to do the distributed transactions explicitly in software and the time required to certify a transaction is higher. The primary difference is that multithreading APIs are designed around the assumption that transaction certification is extremely fast and therefore inexpensive, allowing the programmer to use the implied transactions far more often than may be strictly necessary for the sake of convenience.

This is more obvious with SMP systems with 10^3 processors, where the performance characteristics are virtually identical to an MPI system of the same size with a high-quality interconnect. The higher price tag of SMP buys you the same API as small SMP systems, but you will still have to program it as though it were MPI because the assumptions of standard threading APIs are not really true on that scale, making it far from obvious what the value proposition of extremely large SMP is exactly. SMP encourages moving data around while MPI encourages moving metadata around, the inefficiency of the former becoming more obvious and harder to program around when you run out of bandwidth. Telling another processor what you want it to do with its local memory is often much cheaper than slurping big data structures back and forth across the bus, wrapped in distributed transaction semantics.


As a tangent, I find the new Sun Microsystems chip to be an interesting random step in the right direction. Lots of simple 64- bit cores connected to loads of fast memory and I/O channels. If I were designing it, I might have only put a single core on the chip and turned the rest of the silicon into very fast local memory with plenty of I/O to other processors. Sun obviously has different needs.


J. Andrew Rogers

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to