WHAT: Revise the global ORTE data structures:
* orte_app_context_t
* orte_node_t
* orte_job_t
* orte_proc_t
WHY: The current definitions are rigid and hard to extend. In the past, we
have extended
them by hard-coding new fields into the structures. This has led
to issues for
off-trunk researchers and developers, and caused the structures
to balloon in size.
WHEN: This is pretty disruptive and touches a lot of ORTE files, so let's
give it a few weeks
and set timeout for June 3rd after the telecon
BRANCH: https://bitbucket.org/rhc/ompi-rtc
PLEASE test your favorite mpirun options to ensure everything is working
correctly. There are quite a few combinations, and I can't possibly guarantee I
have hit them all.
****************************
More detail:
As noted in the summary, every time we want to add another capability to the
system, we frequently wind up adding another dedicated field to the ORTE data
structures. For example, we have a number of booleans in the structures, each
of which may only be used in a single, uncommon use-case. Those wanting to
investigate new capabilities, or developers wishing to add something to the
system, not only need to add more fields to the structures, but also (a) ensure
that the datatype support routines know about them, (b) ensure that the odls
packing/unpacking functions know how to handle it, if the capability involves
launching processes, and (c) ensure that the nidmap code knows about any new
data fields.
All together, it is pretty intimidating and fragile - and adds memory footprint
for every feature.
As many of you know, we are about to add a number of new features to the system
(e.g., power/freq control, direct cgroup support). After starting to work on
these, it became apparent that we would be adding yet another set of rarely
used fields to the various structures, further increasing the memory footprint
for no good reason. Hence, I undertook a revision of not only the objects, but
also how we handle their transmission during launch.
The resulting code can be broken down into two key concepts:
* combining frequently used booleans into a single "flag" field in each
structure - the size of the flag varies between the structures according to the
number of required booleans. Macros are provided to set/unset/test flags so we
can easily revise the system as required (e.g., if we need someday to go to
opal_bitmap_t's instead of simple int-like fields).
* adding a list of "attributes" to each structure where infrequently used
and/or non-boolean options can be stored. A new "orte_attribute_t" structure is
defined that provides a key/value storage mechanism for these lists. In order
to conserve memory, the key is an integer instead of a string. Functions for
setting and getting attributes are provided. When an attribute is "set", you
also specify whether it is to be shared globally (i.e., to be included when
packing the associated structure's attribute list), or to be kept local.
Definition of the new flags and attributes are provided in two new files:
* orte/util/attr.h - contains key and structure definitions for attributes,
and flag names plus macros
* orte/util/attr.c - contains the attribute support functions
These revisions have allowed me to not only reduce our memory footprint, but
also reduce the size of the launch message by removing a lot of duplicated and
unnecessary info. The nidmap and odls codes have been revamped accordingly.
Comments and/or suggestions are welcomed.
Ralph