On Mar 31, 2006, at 3:44 AM, Christian Kauhaus wrote:

first I'd like to introduce myself. I'm Christian Kauhaus and I am
currently working at the Department of Computer Architecture at the
University of Jena (Germany). Our work group is digging into how to
connect several clusters on a campus.

As part of our research, we'd like to evaluate the use of IPv6 for
multi-cluster coupling. Therefore, we need to run OpenMPI over TCP/ IPv6.
Last year during EuroPVM/MPI I already had a short chat with Jeff
Squyres about this, but now we actually do have the time to work on
this.

Great! We currently only have IPv4 support, but IPv6 has always been on the wishlist. Most of the developers in the States don't have access to IPv6 networks, so it hasn't been a concern / need that we've had time to address at this point. It would be great if someone else could take a stab at it.

First we are interested to integrate IPv6 support into the tcp btl. Does
anyone know if there is someone already working on this? If yes, we
would be glad to cooperate. If no, we would start it by ourselves,
although we would need some help from the OpenMPI developer community
regarding OpenMPI / ORTE internals.

As far as I'm aware, there is no one working on IPv6 support for Open MPI. We would welcome anyone willing to work on the support :). And we'll be as responsive as possible to requests for help / advice - this list is the best forum for that type of discussion.

Are your hosts configured for both IPv4 and IPv6 traffic (or are they IPv6 only)? I ask because that will determine what your first step is. There are two TCP communication channels in Open MPI -- the tcp oob component, used by the run-time layer for out-of-band communication and the tcp btl component, used by the MPI layer for MPI traffic. Without a working tcp oob component, it's pretty close to impossible to start the tcp btl, so if you only have IPv6 on your machines, that will dictate starting with the tcp oob component. Otherwise, you could start with either component (although both will eventually need to be updated).

The oob tcp component (code is in orte/mca/oob/tcp/) is fairly straight-forward, especially if all you need to deal with is connection setup. There are really two pieces to be aware of - in oob_tcp.c there is some code dealing with uri strings - this is used by the upper layers to ask the oob component for it's contact address (as a uri string) and to give the oob component a uri string and associate it with an orte_process_name. The peer connection code is in a combination of oob_tcp_peer.[h,c] and oob_tcp_addr.[h,c]. I'm sure you will have to modify oob_tcp_addr.[h,c], and I think you'll probably have to modify oob_tcp_peer.[c,h] as well.

I should diverge for a second... Every process in Open MPI has an orte_process_name. This value will be unique between processes that can connect to each other. When I want to send an out of band message to a remote host, I send to that orte_process_name and the communication layer figures out how to get the message over there. So if the upper layers associate an orte_process_name with a uri string, you'll use that information to contact that orte_process_name, should you ever need to send data that way.

The tcp btl is mostly the same type of thing. The main difference is how peers are setup. Instead of a char string to share endpoint connections, we have what we call the "modex". This is basically a one-time write, many time read global data store. So the tcp btl puts a fixed-size structure into the modex data (behind the scenes, this is stored in our gpr data store), and each process in the universe can get that data by looking up against it's process name (actually, in this case, it's a datastructure called the ompi_proc, which is an orte_process_name, plus data needed for each MPI process). So we'd need to extend that datastructure out a little bit to be able to support either IPv4 or IPv6 addresses. From there, it would be the usual changes during connection setup issue.

This was a fairly simple overview - I'd recommend starting with the tcp oob component and asking when you have questions about what you see. You don't need to run Open MPI jobs to test the tcp oob component - you can just use orterun to launch normal old unix commands. Something with a bit of stdio output will give a reasonable first test of the oob. I usually do something like:

  orterun -np 2 -host host_a,host_b ls -l $HOME

as I have enough files in my home directory that a page or two of standard I/O should be returned.

Hope this helps,

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to