Crumby - referenced wrong commit. My commit was r19866. My apologies
to George, the author of 19868 that cleaned up a problem created by my
commit.
Ralph
On Oct 31, 2008, at 5:50 PM, Ralph Castain wrote:
Hi all
I made a commit a little earlier that contains modifications that
reduces duplicate data storage and represents a first step towards
supporting fully routed RML communications, along with a new "radix
tree" routed component requested by ORNL. There will undoubtedly be
improvements to these changes over the next few months, but they
provide an initial platform for us to more thoroughly investigate
the issues involved in fully routing all out-of-band communications.
A brief outline of the changes include:
1. removes the direct routed component and adds a new "radix"
component
2. shifts storage of nidmap and pidmap info from the odls to the ess
on daemons - this is where the data is stored for everyone else, so
it makes no sense to store it someplace different on the daemon.
Required adding an API to the ess framework so that a pidmap can be
added to the data in the ess when daemons get a comm_spawn request
(the ess data store was already setup for this - just didn't have
the API yet).
3. adds an API to the ess framework to obtain the daemon that hosts
a specified proc from the ess pidmap. Because this data is now
obtained here, we don't need to keep calling
orte_routed.update_route for every proc in our own job - so those
calls have been removed from the startup procedure. This eliminates
the hash tables in every routed module that essentially duplicated
the pidmap already present in the ess - not because anyone was
stupid, but rather because the first routed modules were originally
written prior to the ess pidmap being created, and everyone copy/
pasted from there.
At the moment, the revised trunk fully routes all communications
with two exceptions:
1. the binomial module still directly routes between all daemons -
i.e., communications don't flow along the tree, but instead short-
circuit the tree to go directly to the daemon that hosts the target
proc. I propose to change this in a later revision, but want to
leave something constant for the moment.
2. all routed modules have daemons sending direct to the HNP itself.
This was required for two reasons:
(a) during startup, the daemons need to "phone home", but have no
knowledge at that moment of the contact info for the other daemons
in the routing tree. Thus, they have no choice but to send direct to
the HNP. We hope to change this in a later revision by switching to
well-known static ports - but for now, we have to go direct.
(b) in our current shutdown procedure, the outbound message telling
the orteds to terminate goes out across the module's routing tree.
This xcast procedure causes the daemon to relay the cmd to the next
daemons in the tree, and then to execute it. Thus, after relaying
the cmd, the daemon dutifully terminates. However, we require each
daemon to send a confirming message to return to the HNP so it knows
it can exit. That returning message cannot get through because the
intermediate daemons have already terminated. I am working on
alternative methods for detecting daemon termination so we can
eliminate the return "ack" - but for now, we have to send the "ack"
direct to the HNP to ensure it gets through.
Some preliminary tests I've conducted indicate that fully routing
communications had no detrimental impact on launch speed nor IB
wireup time. I plan to further test this at larger scales, as well
as continue to develop the new capabilities.
Please let me know if you encounter any problems, or have any
comments/suggestions.
Ralph