WHAT: convert orte to start by launching a virtual machine across all allocated 
nodes

WHY: support topologically-aware mapping methods

WHEN: sometime over the next couple of months

*******************************************************
Several of us (including Jeff, Terry, Josh, and Ralph) are working to create 
topologically-aware mapping modules. This includes modules that correctly map 
processes to cores/sockets, perhaps take into account NIC proximity and switch 
connectivity, etc.

In order to make this work, the rmaps components in mpirun need to know the 
local topology of the nodes in the allocation. We currently obtain that info 
from the orted's as each orted samples the local topology via the opal sysinfo 
framework and then reports it back to mpirun. Unfortunately, we currently don't 
launch the orteds until -after- we map the job, so the topology info cannot be 
used in the mapping algorithm.

This work will modify the launch procedure to:

1. determine the final "allocation" using the current ras + hostfile + 
dash-host method.

2. launch a daemon on every node in the final "allocation"

3. each daemon discovers the local resources and reports that info back to 
mpirun

4. mpirun maps the job against the daemons using the node resource info

5. mpirun sends the launch msg to all daemons.

6. the daemons launch the job -and- provide a global topology map to all procs 
for their subsequent use

Note the significant change here: in the current procedure, we map the job on 
the nodes-to-be-used and then only launch daemons on nodes that have 
application procs on them. If the app then calls comm_spawn, we launch any 
additional daemons as required.

Under this revised procedure, we might launch daemons on nodes that are not 
used by the initial job. If the app then calls comm_spawn, no additional 
daemons will be required as we already have daemons on all available nodes. 
This simplifies comm_spawn, but precludes the ability of an app to dynamically 
discover and add nodes to the "allocation". There has been sporadic interest in 
such a feature, but nothing concrete.



Reply via email to