Re: [Beowulf] RE: programming multicore clusters

Joe Landman Fri, 15 Jun 2007 07:08:44 -0700


Toon Knapen wrote:

Mark Hahn wrote:
unless most of your IPC is this kind of async, unsync, passive data
reference, I wouldn't think twice: go MPI.  the current media frenzy
about multicore systems (nothing new!) doesn't change the picture much.
Because of everybody going multi-core, everybody is pushing to gomulti-threading to exploit these architectures (e.g. the gaming-worldand many more). IIUC you're saying that MPI might better exploit thesearchitectures? Interesting POV!

Multicore has some interesting up sides. The down sides,oversubscription of memory bandwidth for the memory pipes out of thesockets, remind me of the days of larger SMP boxes with big busses inthe early/mid 90s.

First, shared memory is nice and simple as a programming model.Multicore suggests that shared memory should be very easy to exploit.You have to worry about contention, affinity, and everything else weused to have to worry about a decade ago with the big machines. Yourprecious resources that you need to optimize utilization of are nolonger CPU cycles, but bandwidth.

Second, MPI is a more complex model. It forces you to reconsider howthe algorithm is mapped to the hardware. And it makes no assumptionsabout the hardware, at least in the API. In the implementation, itmight be taught about multi-core, and optimizing communication withinboxes via shm sockets, and between boxes by other methods. I think afew of the MPI toolkits do this today (Scali, Intel, OpenMPI, ...).

Neither one of these modalities take into account the fact that memorybandwidth is finite out of a socket. Technically this is animplementation issue, but as we hit larger and larger core sizes, somecodes, well, larger fractions of the parallel code base, are likely torun into this resource contention issue.

We were seeing contention for fabric interconnects (e.g. bus contention)with LAMMPS runs for a customer last year simply between single and dualcore. It was significant enough that the customer opted for singlecore. This contention is not going to get better as you increase thenumber of cores. Since MPI does, in part, depend upon resources beingcontended for (interconnect), it is not at all clear to me that MPI willbe the *best* choice for programming all the cores, though it certainlywould be a simple choice.

Greg is right when he notes that the hybrid model is a challenge.Unfortunately we appear to be facing a regime with multiple layers ofhierarchies. So this will need resolution. You can create a globally"optimal" code via MPI, that may not be as efficient locally as youlike, and will likely grow less so with more cores, or a locally optimalnever-get-out-of-the-box code via shared memory.

Shared memory scales nicely on NUMA machines, assuming 1-2 cores permemory controller. It won't/doesn't scale with 8 cores and one memorybus. How well does stream run on clovertown? NAS parallel?


The issue is, at the end of the day, the contended for resources.

Joe

t
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] RE: programming multicore clusters

Reply via email to