On Jun 6, 2014, at 7:11 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> Looks like Ralph's simpler solution fit the bill. Yeah, but I still am unhappy with it. It's about the stupidest connection model you can imagine. What happens is this: * a process constructs its URI - this is done by creating a string with the IP:PORT for each subnet the proc is listening on. The URI is constructed in alphabetical order (well, actually in kernel index order - but that tends to follow the alphabetical order of the interface names). This then gets passed to the other process * the sender breaks the URI into its component parts and creates a list of addresses for the target. This list gets created in the order of the components - i.e., we take the first IP:PORT out of the URI, and that is our first address. * when the sender initiates a connection, it takes the first address in the list (which means the alphabetically first name in the target's list of interfaces) and initiates the connection on that subnet. If it succeeds, then that is the subnet we use for all subsequent messages. So if the first subnet can reach the target, even if it means bouncing all over the Internet, we will use it - even though the second subnet in the URI might have provided a direct connection! It solves Gilles problem because "ib" comes after "eth", and it matches what was done in the original OOB (before my rewrite) - but it sure sounds to me like a bad, inefficient solution for general use. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/14987.php