On Mar 31, 2006, at 8:40 AM, Adrian Knoth wrote:

On Fri, Mar 31, 2006 at 10:44:11AM +0200, Christian Kauhaus wrote:

Hello *,

Hi.

University of Jena (Germany). Our work group is digging into how to
connect several clusters on a campus.

I think I'm also a member of this workgroup, though I am not
working at University of Jena, but studying there.

First we are interested to integrate IPv6 support into the tcp btl.
Does anyone know if there is someone already working on this?

I have a first quick and dirty patch, replacing AF_INET by AF_INET6,
the sockaddr_in structs and so on.

Is there a way to do this to better support both IPv4 and IPv6? it looks like you had to change an awful lot of interface declarations, making the code IPv6 only...

I think it is broken, the calculation of net1 and net2 in
btl_tcp_proc.c isn't really ported and to be honest: I don't
understand the details, i.e. do I have to port name lookups,
are there high level structures relying on IPv4 structs
and so on.

The port name information will all be in the modex share that I talked about in the previous e-mail, so it's just a matter of looking it up in the endpoint information. As for the code in mca_btl_tcp_proc_insert(), which is what I think you're referring to by the net1/net2 code, that's intended to be used to try to get all the multi-nic scenarios wired up in the most advantageous way possible. So we look at the combination IPv4 addr and netmask and prefer to connect two endpoints in the same subnet. We also try not to connect public and private addresses, as that rarely works the way people intend.

As an example, say we have two hosts, both with two NICs:

  host1: 129.79.200.1/255.255.0.0, 129.72.100.1/255.255.0.0
  host2: 129.79.200.2/255.255.0.0, 129.72.100.2/255.255.0.0

When host1 is trying to wire-up connections to host2, it's going to figure out how to wire up the btl instance for the 79.200 address and the 72.100 address separately. For the 79.200.1 address, we're going to see we have two addresses we can connect to - 129.79.200.2 and 129.72.100.2. By looking at netmasks and addresses, we can make the guess that the 79.200.2 address is on the "same" network and the 72.100.2 address is on a "different" network. I'm not sure how IPv6 deals with netmasks and routing, but I'm assuming there would be something similar.

At least it compiles ;) (let's ship it)

I don't know if this patched tcp-component can handle
IPv6 connections, I've never tested it. I think it
even breaks IPv4 functionality; we should make clear
how IPv4 and IPv6 may work in parallel (or may not, if
one considers IPv4 deprecated ;)

You can retrieve the patch here:

   http://cluster.inf-ra.uni-jena.de/~adi/ompi.ipv6.v1.patch

I'd also appreciate any suggestions, hints or even success stories ;)

From a practical standpoint, Open MPI has to support both IPv4 and IPv6 for the foreseeable future. We currently try to wire up one connection per "IP device", so it seems like we should be able to find some way to automatically switch between IPv6 or IPv4 based on what we determine is available on that host, right? I'll admit it has been a year or so since I've looked at this, so I could be completely off base there.

Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to