[OMPI devel] Outstanding 1.3 RTE features

Ralph Castain Wed, 7 May 2008 10:24:21 -0400

At the weekly telecon this week, we talked about when to branch the 1.3
release. I was asked if I could provide a list of where we stand relative to
promised functionality, at least as far as the RTE is concerned.


Here is what I have compiled, in rough grouping by priority as expressed to
me:


Promised, and needed
* topo mapper - automated mapping that puts ranks on network-nearest
  neighbors. Required by several of LANL's more ambitious science
  projects. I'll hopefully have a prototype in the system before
  leaving on vacation.

* xml output - required for Eclipse PTP support, desired by several
  other tools. As per the telecon, there is no way we can get
  something meaningful in the system before the proposed code
  branch. However, this is needed by Oct for PTP - more lenient
  timeframe from the other tools. What we -can- do is get the
  output framework created before the branch, and then add the
  xml component during the summer - but that requires a change
  from our usual policy of no new components in sub-releases.
  Requires new mpirun cmd line flag: -xml proposed.

* upgrades to the sequential mapper - add ability to provide
  relative sequencing for automated node allocations, claim
  multiple slots for a rank. All fits within existing component.

* local orted spawn - ability for remote orted to locally spawn
  a coprocessor process. Required for hybrid RR where MPI procs
  are needed on the coprocessor. Basic elements are in system,
  but need to be completed now that launch system is stabilizing.



Promised, could be delayed
* minimizing HNP sockets - everything we need is in the system.
  What we need is just to pass to the orteds the nodemap in a
  manner that they can decode and use during their startup so
  they don't have to callback to the HNP. The scheme has been
  designed - just needs to be implemented.

* carto routed - uses the provided network topology to define
  RML message routing, thus minimizing message hops during
  startup.

* direct/standalone launch - I believe the basic infrastructure
  is now present, and indeed at least a couple of systems use
  standalone launch methods now. Expanding that to additional
  environments will take new PLM/ESS components, perhaps with
  supporting utilities. Likely not appropriate for a sub-release.

* static ports - basic infrastructure for procs and orteds to use
  static OOB/TCP ports, but we don't currently take advantage of it.
  This shouldn't require any API changes or major restructuring of
  code as everything required is already there.

* add-hostfile, add-host - these were included in the hostfile
  wiki page description as they had been requested by several users.
  If not included in 1.3, we need to update the wiki page and include
  that fact in the FAQ section, at the least, since users were
  told this would be supported.



Wanted/Requested by various users or developers
* orted sm file - some of our improved behavior depends upon
  exclusive use of nodes. We can remove that constraint by
  letting jobs from different users that are colocated on a node
  have knowledge of each other's existence. It has been
  proposed that this be accomplished by creating a shared memory
  area that the procs/orteds can access to find out who else
  is on a node, what static ports they are using, etc. Design
  still to be worked out.

* usage reporting - add appropriate mpirun cmd line option to
  request the orteds to report proc resource usage upon proc
  termination. Pretty trivial to do. Requested by a few users
  and a couple of tool developers.

* tool query support - ability for a tool to interactively
  query process/job status, usage stats, etc. The tool comm
  library is partially implemented today, but doesn't support
  the full range of requested functionality.

* support for recursive mpirun calls - this has come up a few
  times on the user list. Basically, it requires adding a new
  mpirun cmd line option (--recursive) so mpirun can purge the
  environment of mca params set during spawn before calling
  orte_init.



Future improvements
* reduced launch messaging - put launch information in orted's environment
  (for systems that support it) so that orted can determine and launch
  its local procs without communicating back to the HNP. We have a design
  for this capability, but have purposely held off implementation until
  after the 1.3 branch.

* minimized mpirun memory footprint - we currently store a bunch of info
  to support various debuggers, c/r, etc. This info isn't actually required
  to be stored for operation of the MPI job and/or ORTE, so it could
  either be released or simply not created. This plan calls for yet
  another option(!) that would tell mpirun to minimize its memory
  footprint. Design has been done - implementation has not started.

[OMPI devel] Outstanding 1.3 RTE features

Reply via email to