WHAT: Add hardware thread support to processor affinity components and new options to orterun.

WHY: OMPI currently does not correctly recognize processors that support hardware threads. In cases where the user uses the mpirun options -bind-to-* and -by-* processes are bound to the first thread on each core. In cases where the user specify cores to bind to in the rankfile those numbers are interpreted as thread ids as opposed to core ids. These ill side affects can lead to confusion as to which resources processes in a job are bound to and in the worse case a user could end up unknowingly oversubscribing resources.

WHERE: orte/mca/rmaps, orte/mca/odls, orte/util/hostfile, orte/tools/orterun/orterun.c, opal/mca/paffinity
WHEN: 03/15/10

TIMEOUT: 02/24/10
-----
The current OMPI paffinity implementation uses PLPA to set bindings of processes to cores or sockets. In systems that support hardware threads, however, PLPA looks at a hardware thread as a core and in certain cases may not be able to completely map all hardware threads. This happens because the paffinity framework does not recognize hardware threads. I propose support such that hardware thread resources can be identified and have processes bound to them. (Note: we plan on creating a new paffinity component using the hwloc api as opposed to extending the PLPA component)

Once the paffinity framework supports hardware threads I would like to propose the following defaults and new options that will support hardware threads. I think we should first implement the "Defaults" section, put it back, and then start on new options and rankfile/hostfile fields.

Defaults:

In the case of no process binding we maintain the current rule of not doing anything.

When -bind-to-core or a core binding defined in the rankfile, the MPI process will be bound to all hardware threads on a core (the OS will manage the scheduling of processes between hardware threads). This is similar to the how OMPI handles scheduling of processes on core when -bind-to-socket option is specified to mpirun.

New Options to mpirun:

1. -bind-to-thread - Bind processes to hardware threads, analogous to -bind-to-core and -bind-to-socket 2. -threads-per-proc - Use the number of threads per process if used with one of the -bind-to* options 3. -bythread - Associate processes with successive hardware threads if used with one of the bind-to-* options. 4. -num-threads - Specify the number of hardware threads per core (for cases where Open MPI doesn't already know this information)


New Fields to Rankfiles:

We'll be adding a third field in the slot specification of the Rankfile. So a rankfile entry that has 3 fields specified for a slot the last field is the hardware thread id. Otherwise it is assumed that hardware thread scheduling is left up to the OS.
rank 0=aa slot=1:0:0-3
rank 1=bb slot=0:0
rank 2=cc slot=1-2

So in the case of rank 0 the process is bound to socket 1, core 0 and hardware threads 0-3. In the case of rank 1 it is bound to socket 0, core 0 and hardware thread scheduling is left to the OS. In the case of rank 2 it is bound to cores 1 and 2 hardware thread scheduling is left to the OS.



Reply via email to