On Jun 25, 2009, at 11:10 AM, Ralph Castain wrote:
They do flow along the route at all times. However, without static ports the orted has to start by directly connecting to the HNP and sending the orted's contact info to the HNP.
This is the part I don't understand. Why can't they send the contact info along the route as well? Don't they have enough information to wire a route to the HNP? If not, can't they be given it at startup?
Then the HNP includes that info in the launch msg, allowing the orteds to wireup their routes.
So the difference is that the static ports allow us to avoid that initial HNP-direct connection, which is what causes the flood.
I should warn everyone that in my experiments the HNP flood is not the only problem with tree spawning. In fact, it doesn't even seem to be the limiting problem. At the moment, it appears that the limiting problem on my cluster has to do with sshd/rshd accessing some name service (e.g., gethostbyname, getpwnam, getdefaultproject, or something like that).
I am hoping to find that this is just some cluster configuration oddity. YMMV, of course.
The other thing that hasn't been done yet is to have the "procs- launched" messages rollup in the collective - the HNP gets one/ daemon right now, even though it comes down the routed path. Hope to have that done next week. That will be in operation regardless of static vs non-static ports.
Great! Iain