[OMPI devel] Commit r19868

Ralph Castain Fri, 31 Oct 2008 19:51:07 -0400

Hi all

I made a commit a little earlier that contains modifications thatreduces duplicate data storage and represents a first step towardssupporting fully routed RML communications, along with a new "radixtree" routed component requested by ORNL. There will undoubtedly beimprovements to these changes over the next few months, but theyprovide an initial platform for us to more thoroughly investigate theissues involved in fully routing all out-of-band communications.


A brief outline of the changes include:

1. removes the direct routed component and adds a new "radix" component

2. shifts storage of nidmap and pidmap info from the odls to the esson daemons - this is where the data is stored for everyone else, so itmakes no sense to store it someplace different on the daemon. Requiredadding an API to the ess framework so that a pidmap can be added tothe data in the ess when daemons get a comm_spawn request (the essdata store was already setup for this - just didn't have the API yet).

3. adds an API to the ess framework to obtain the daemon that hosts aspecified proc from the ess pidmap. Because this data is now obtainedhere, we don't need to keep calling orte_routed.update_route for everyproc in our own job - so those calls have been removed from thestartup procedure. This eliminates the hash tables in every routedmodule that essentially duplicated the pidmap already present in theess - not because anyone was stupid, but rather because the firstrouted modules were originally written prior to the ess pidmap beingcreated, and everyone copy/pasted from there.

At the moment, the revised trunk fully routes all communications withtwo exceptions:

1. the binomial module still directly routes between all daemons -i.e., communications don't flow along the tree, but instead short-circuit the tree to go directly to the daemon that hosts the targetproc. I propose to change this in a later revision, but want to leavesomething constant for the moment.

2. all routed modules have daemons sending direct to the HNP itself.This was required for two reasons:

(a) during startup, the daemons need to "phone home", but have noknowledge at that moment of the contact info for the other daemons inthe routing tree. Thus, they have no choice but to send direct to theHNP. We hope to change this in a later revision by switching to well-known static ports - but for now, we have to go direct.

(b) in our current shutdown procedure, the outbound message tellingthe orteds to terminate goes out across the module's routing tree.This xcast procedure causes the daemon to relay the cmd to the nextdaemons in the tree, and then to execute it. Thus, after relaying thecmd, the daemon dutifully terminates. However, we require each daemonto send a confirming message to return to the HNP so it knows it canexit. That returning message cannot get through because theintermediate daemons have already terminated. I am working onalternative methods for detecting daemon termination so we caneliminate the return "ack" - but for now, we have to send the "ack"

direct to the HNP to ensure it gets through.

Some preliminary tests I've conducted indicate that fully routingcommunications had no detrimental impact on launch speed nor IB wireuptime. I plan to further test this at larger scales, as well ascontinue to develop the new capabilities.

Please let me know if you encounter any problems, or have any comments/suggestions.

Ralph

[OMPI devel] Commit r19868

Reply via email to