Crumby - referenced wrong commit. My commit was r19866. My apologies to George, the author of 19868 that cleaned up a problem created by my commit.

Ralph

On Oct 31, 2008, at 5:50 PM, Ralph Castain wrote:

Hi all

I made a commit a little earlier that contains modifications that reduces duplicate data storage and represents a first step towards supporting fully routed RML communications, along with a new "radix tree" routed component requested by ORNL. There will undoubtedly be improvements to these changes over the next few months, but they provide an initial platform for us to more thoroughly investigate the issues involved in fully routing all out-of-band communications.

A brief outline of the changes include:

1. removes the direct routed component and adds a new "radix" component

2. shifts storage of nidmap and pidmap info from the odls to the ess on daemons - this is where the data is stored for everyone else, so it makes no sense to store it someplace different on the daemon. Required adding an API to the ess framework so that a pidmap can be added to the data in the ess when daemons get a comm_spawn request (the ess data store was already setup for this - just didn't have the API yet).

3. adds an API to the ess framework to obtain the daemon that hosts a specified proc from the ess pidmap. Because this data is now obtained here, we don't need to keep calling orte_routed.update_route for every proc in our own job - so those calls have been removed from the startup procedure. This eliminates the hash tables in every routed module that essentially duplicated the pidmap already present in the ess - not because anyone was stupid, but rather because the first routed modules were originally written prior to the ess pidmap being created, and everyone copy/ pasted from there.

At the moment, the revised trunk fully routes all communications with two exceptions:

1. the binomial module still directly routes between all daemons - i.e., communications don't flow along the tree, but instead short- circuit the tree to go directly to the daemon that hosts the target proc. I propose to change this in a later revision, but want to leave something constant for the moment.

2. all routed modules have daemons sending direct to the HNP itself. This was required for two reasons:

(a) during startup, the daemons need to "phone home", but have no knowledge at that moment of the contact info for the other daemons in the routing tree. Thus, they have no choice but to send direct to the HNP. We hope to change this in a later revision by switching to well-known static ports - but for now, we have to go direct.

(b) in our current shutdown procedure, the outbound message telling the orteds to terminate goes out across the module's routing tree. This xcast procedure causes the daemon to relay the cmd to the next daemons in the tree, and then to execute it. Thus, after relaying the cmd, the daemon dutifully terminates. However, we require each daemon to send a confirming message to return to the HNP so it knows it can exit. That returning message cannot get through because the intermediate daemons have already terminated. I am working on alternative methods for detecting daemon termination so we can eliminate the return "ack" - but for now, we have to send the "ack"
direct to the HNP to ensure it gets through.

Some preliminary tests I've conducted indicate that fully routing communications had no detrimental impact on launch speed nor IB wireup time. I plan to further test this at larger scales, as well as continue to develop the new capabilities.

Please let me know if you encounter any problems, or have any comments/suggestions.
Ralph


Reply via email to