> http://softwareforums.intel.com/ids/board/message?board.id=11&message.id=3793 > > Maybe it will be interesting for many people here.
I'm not so sure. DSM has been done for decades now, though I haven't heard of OpenMP (ish) implementations based on it. fundamentally, there's no real conceptual challenge to implementing SHM across nets, just practical matters. regardless of whether you present a SHM interface to the programmer, you eventually have to adopt the same basic programming models that reflect the topology of your interconnect. for instance, scalable OpenMP codes seem to migrate towards looking a bit like message-passing codes. after all, if you care about latency, you want to batch together relevant data. and scalability really does mean caring about latency (and often bandwidth). doing DSM based on pages is convenient, since it means you can put your smarts into a library with a fairly straightforward kernel/net interface. innumerable masters-thesis projects have done this, as well as Mosix and others. the downside is that to fetch a single byte, you take a page fault, and do some kind of RPC. but if your shared data is read-mostly, or naturally very granular, you're golden. hooking into the language is a popular way to break up the chunks - the language can simply emit get/put at language-appropriate places. for those apps hurt by page-based sharing, this is certainly better. but writing a compiler, even a preprocessor, is a pretty big deal. there have been multiple implementations of this approach, but somehow they never gain much traction. doing an implementation that is really smart about handling sequential vs random patterns (prefetching, etc), doing all the right locking, load-balancing, accepting programmer hints/assertions, etc, that's a pretty big undertaking, and I don't know of a system which has done it, well. also, there are sort of intermediate interfaces, such as Global Arrays or Charm++. and fundamentally, you have to notice that SHM approaches tend to yield quite modest speedups. for instance, in the Intel whitepaper, they're showing speedups of 3-8 for a 16x machine. that's really not very good. if you insist on a SHM interface, IMO you're best off sticking to fairly small SMP machines as the mass-market inches up. right now, that's probably 4x2 opterons, but with quad-cores coming (and some movement towards smarter system fabrics by AMD and Intel), affordable SMP is likley to grow to ~32x within a year or two. IMO, anyone who needs "real" scaling (>64x real speedup, say) has already bitten the MPI bullet. but I'm willing to be told I'm a message-passing bigot ;) regards, mark hahn. _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
