WHAT: Add hardware thread support to processor affinity components and
new options to orterun.
WHY: OMPI currently does not correctly recognize processors that support
hardware threads. In cases where the user uses the mpirun options
-bind-to-* and -by-* processes are bound to the first thread on each
core. In cases where the user specify cores to bind to in the rankfile
those numbers are interpreted as thread ids as opposed to core ids.
These ill side affects can lead to confusion as to which resources
processes in a job are bound to and in the worse case a user could end
up unknowingly oversubscribing resources.
WHERE: orte/mca/rmaps, orte/mca/odls, orte/util/hostfile,
orte/tools/orterun/orterun.c, opal/mca/paffinity
WHEN: 03/15/10
TIMEOUT: 02/24/10
-----
The current OMPI paffinity implementation uses PLPA to set bindings of
processes to cores or sockets. In systems that support hardware
threads, however, PLPA looks at a hardware thread as a core and in
certain cases may not be able to completely map all hardware threads.
This happens because the paffinity framework does not recognize hardware
threads.
I propose support such that hardware thread resources can be identified
and have processes bound to them. (Note: we plan on creating a new
paffinity component using the hwloc api as opposed to extending the PLPA
component)
Once the paffinity framework supports hardware threads I would like to
propose the following defaults and new options that will support
hardware threads. I think we should first implement the "Defaults"
section, put it back, and then start on new options and
rankfile/hostfile fields.
Defaults:
In the case of no process binding we maintain the current rule of not
doing anything.
When -bind-to-core or a core binding defined in the rankfile, the MPI
process will
be bound to all hardware threads on a core (the OS will manage the
scheduling of processes between hardware threads). This is similar to
the how OMPI handles scheduling of processes on core when
-bind-to-socket option is specified to mpirun.
New Options to mpirun:
1. -bind-to-thread - Bind processes to hardware threads, analogous to
-bind-to-core and -bind-to-socket
2. -threads-per-proc - Use the number of threads per process if used
with one of the -bind-to* options
3. -bythread - Associate processes with successive hardware threads if
used with one of the bind-to-* options.
4. -num-threads - Specify the number of hardware threads per core (for
cases where Open MPI doesn't already know this information)
New Fields to Rankfiles:
We'll be adding a third field in the slot specification of the Rankfile.
So a rankfile entry that has 3 fields specified for a slot the last
field is the hardware thread id. Otherwise it is assumed that hardware
thread scheduling is left up to the OS.
rank 0=aa slot=1:0:0-3
rank 1=bb slot=0:0
rank 2=cc slot=1-2
So in the case of rank 0 the process is bound to socket 1, core 0 and
hardware threads 0-3.
In the case of rank 1 it is bound to socket 0, core 0 and hardware
thread scheduling is left to the OS.
In the case of rank 2 it is bound to cores 1 and 2 hardware thread
scheduling is left to the OS.