On Jul 18, 2005, at 6:28 AM, Jeff Squyres wrote:

On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:

Generally speaking, if you launch <=N processes in a job on a node
(where N == number of CPUs on that node), then we set processor
affinity.  We set each process's affinity to the CPU number according
to the VPID ordering of the procs in that job on that node. So if you
launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
go to processor 1, etc. (it's an easy, locally-determined ordering).

   You'd need to be careful with dual-core cpus.  Say you launch a 4
task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
bandwidth to each MPI task.

With the potential for non-trivial logic like this, perhaps the extra
work for a real framework would be justified, then.

Also, how would this work with hybrid MPI+threading (either pthreads
or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and
you
start up 2 MPI tasks with 4 compute threads in each task.  The optimum
layout may not be running the MPI tasks on cpu's 0 and 1.  Several
hybrid applications that ran on ASC White and now Purple will have
these
requirements.

Hum.  Good question.  The MPI API doesn't really address this -- the
MPI API is not aware of additional threads that are created until you
call an MPI function (and even then, we're not currently checking which
thread is calling -- that would just add latency).

What do these applications do right now?  Do they set their own
processor / memory affinity?  This might actually be outside the scope
of MPI...?  (I'mm not trying to shrug off responsibility, but this
might be a case where the MPI simply doesn't have enough information,
and to get that information [e.g., via MPI attributes or MPI info
arguments] would be more hassle than the user just setting the affinity
themselves...?)

Comments?

If you set things up such that you can specify input parameters on where
to put each process, you have the flexibility you want. The locality API's I have seen all mimiced the IRIX API, which had these capabilities. If you
want some ideas, look at LA-MPI, it does this - the implementation is
pretty strange (just he coding), but it is there.

Rich


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to