Re: [OMPI devel] Intermittent hangs when exiting with error

Ralph Castain Fri, 6 Jun 2014 10:26:11 -0400 (EDT)

On Jun 6, 2014, at 7:11 AM, Jeff Squyres (jsquyres) <[email protected]> wrote:


> Looks like Ralph's simpler solution fit the bill.

Yeah, but I still am unhappy with it. It's about the stupidest connection model 
you can imagine. What happens is this:

* a process constructs its URI - this is done by creating a string with the 
IP:PORT for each subnet the proc is listening on. The URI is constructed in 
alphabetical order (well, actually in kernel index order - but that tends to 
follow the alphabetical order of the interface names). This then gets passed to 
the other process

* the sender breaks the URI into its component parts and creates a list of 
addresses for the target. This list gets created in the order of the components 
- i.e., we take the first IP:PORT out of the URI, and that is our first address.

* when the sender initiates a connection, it takes the first address in the 
list (which means the alphabetically first name in the target's list of 
interfaces) and initiates the connection on that subnet. If it succeeds, then 
that is the subnet we use for all subsequent messages.

So if the first subnet can reach the target, even if it means bouncing all over 
the Internet, we will use it - even though the second subnet in the URI might 
have provided a direct connection!

It solves Gilles problem because "ib" comes after "eth", and it matches what 
was done in the original OOB (before my rewrite) - but it sure sounds to me 
like a bad, inefficient solution for general use.


> 
> -- 
> Jeff Squyres
> [email protected]
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/14987.php

Re: [OMPI devel] Intermittent hangs when exiting with error

Reply via email to