carto is more intended to be a discovery and provider of topology information. How various parts of the OMPI code base use that information is a different issue.

With regards to processor affinity, there are two general ways of doing it:

1. The resource manager tells us what processors have been allocated to us. E.g., provide us some environment variables saying what processors/cores/whatever have been allocated to us on a per-host basis (e.g., in the environment of the launched applications, and therefore may be different on every host). Then Open MPI decides how to split up the allocated host processors amongst all the Open MPI processes on that host.

It would be great if SGE could provide some environment variables to us.

2. The resource manager does all the processor affinity itself. SLURM, for example, has a nice command line syntax for all kinds of processor affinity stuff in their "srun" command. A traditional roadblock to this has been that OMPI currently uses the resource manager to launch a single "orted" process on each node, and then that orted, in turn, launches all the MPI processes locally. However, there is work progressing to remove this roadblock. If I try to describe it, I'm sure I'll get it wrong :-) -- Ralph / IU?

-----

Open MPI will need to be able to tell the difference between #1 and #2. So it might be good if the RM always provides the environment variables, but in those env variables, tell us whether the RM did the affinity pinning or not. I.e., in #1, you'll get information about all the processors that are available -- all the processes on a single host will get the same information. In #2, each process will get individualized information about where it has been pinned.

Make sense?



On Jan 11, 2008, at 6:22 AM, Pak Lui wrote:

Hi Rayson,

I guess this is an issue only for SGE. I believe there is something
called 'carto' framework is being developed to represent the node- socket relationship in order to address the multicore issue. I think there are other folks in the team who are actively working on it so they probably can address it better than I can. Here some descriptions on the wiki for it:

https://svn.open-mpi.org/trac/ompi/wiki/OnHostTopologyDescription

Rayson Ho wrote:
Hello,

I'm from the Sun Grid Engine (SGE) project (
http://gridengine.sunsource.net ). I am working on processor affinity
support for SGE.

In 2005, we had some discussions on the SGE mailing list with Jeff on
this topic. As quad-core processors are available from AMD and Intel,
and higher core count per socket is coming soon, I would like to see
what we can do to come up with a simple interface for the SGE 6.2
release, which will be available in Q2 this year (or at least into an
"update" release of SGE6.2 if we couldn't get the changes in on time).

The discussions we had before:
http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7081
http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=4803

I looked at the SGE code, the simplest we can do is to set an
environment variable to tell the task group the processor mask of the
node before we start each task group. Is it good enough for OpenMPI??

After reading the OpenMPI code, I believe what we need to do is that
in ompi/runtime/ompi_mpi_init.c , we need to add an else case:

if (ompi_mpi_paffinity_alone) {
  ...
}
else
{
// get processor affinity information from batch system via the env var
  ...
}

Thanks,
Rayson
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--


- Pak Lui
pak....@sun.com
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to