[OMPI devel] RFC: Processor affinity hardware thread support

Terry . Dontje Thu, 11 Feb 2010 13:27:32 -0500

WHAT: Add hardware thread support to processor affinity components andnew options to orterun.

WHY: OMPI currently does not correctly recognize processors that supporthardware threads. In cases where the user uses the mpirun options-bind-to-* and -by-* processes are bound to the first thread on eachcore. In cases where the user specify cores to bind to in the rankfilethose numbers are interpreted as thread ids as opposed to core ids.These ill side affects can lead to confusion as to which resourcesprocesses in a job are bound to and in the worse case a user could endup unknowingly oversubscribing resources.

WHERE: orte/mca/rmaps, orte/mca/odls, orte/util/hostfile,orte/tools/orterun/orterun.c, opal/mca/paffinity

WHEN: 03/15/10

TIMEOUT: 02/24/10
-----

The current OMPI paffinity implementation uses PLPA to set bindings ofprocesses to cores or sockets. In systems that support hardwarethreads, however, PLPA looks at a hardware thread as a core and incertain cases may not be able to completely map all hardware threads.This happens because the paffinity framework does not recognize hardwarethreads.I propose support such that hardware thread resources can be identifiedand have processes bound to them. (Note: we plan on creating a newpaffinity component using the hwloc api as opposed to extending the PLPAcomponent)

Once the paffinity framework supports hardware threads I would like topropose the following defaults and new options that will supporthardware threads. I think we should first implement the "Defaults"section, put it back, and then start on new options andrankfile/hostfile fields.


Defaults:

In the case of no process binding we maintain the current rule of notdoing anything.

When -bind-to-core or a core binding defined in the rankfile, the MPIprocess willbe bound to all hardware threads on a core (the OS will manage thescheduling of processes between hardware threads). This is similar tothe how OMPI handles scheduling of processes on core when-bind-to-socket option is specified to mpirun.


New Options to mpirun:

1. -bind-to-thread - Bind processes to hardware threads, analogous to-bind-to-core and -bind-to-socket2. -threads-per-proc - Use the number of threads per process if usedwith one of the -bind-to* options3. -bythread - Associate processes with successive hardware threads ifused with one of the bind-to-* options.4. -num-threads - Specify the number of hardware threads per core (forcases where Open MPI doesn't already know this information)



New Fields to Rankfiles:

We'll be adding a third field in the slot specification of the Rankfile.So a rankfile entry that has 3 fields specified for a slot the lastfield is the hardware thread id. Otherwise it is assumed that hardwarethread scheduling is left up to the OS.

rank 0=aa slot=1:0:0-3
rank 1=bb slot=0:0
rank 2=cc slot=1-2

So in the case of rank 0 the process is bound to socket 1, core 0 andhardware threads 0-3.In the case of rank 1 it is bound to socket 0, core 0 and hardwarethread scheduling is left to the OS.In the case of rank 2 it is bound to cores 1 and 2 hardware threadscheduling is left to the OS.

[OMPI devel] RFC: Processor affinity hardware thread support

Reply via email to