On Mar 31, 2006, at 3:44 AM, Christian Kauhaus wrote:
first I'd like to introduce myself. I'm Christian Kauhaus and I am
currently working at the Department of Computer Architecture at the
University of Jena (Germany). Our work group is digging into how to
connect several clusters on a campus.
As part of our research, we'd like to evaluate the use of IPv6 for
multi-cluster coupling. Therefore, we need to run OpenMPI over TCP/
IPv6.
Last year during EuroPVM/MPI I already had a short chat with Jeff
Squyres about this, but now we actually do have the time to work on
this.
Great! We currently only have IPv4 support, but IPv6 has always been
on the wishlist. Most of the developers in the States don't have
access to IPv6 networks, so it hasn't been a concern / need that
we've had time to address at this point. It would be great if
someone else could take a stab at it.
First we are interested to integrate IPv6 support into the tcp btl.
Does
anyone know if there is someone already working on this? If yes, we
would be glad to cooperate. If no, we would start it by ourselves,
although we would need some help from the OpenMPI developer community
regarding OpenMPI / ORTE internals.
As far as I'm aware, there is no one working on IPv6 support for Open
MPI. We would welcome anyone willing to work on the support :). And
we'll be as responsive as possible to requests for help / advice -
this list is the best forum for that type of discussion.
Are your hosts configured for both IPv4 and IPv6 traffic (or are they
IPv6 only)? I ask because that will determine what your first step
is. There are two TCP communication channels in Open MPI -- the tcp
oob component, used by the run-time layer for out-of-band
communication and the tcp btl component, used by the MPI layer for
MPI traffic. Without a working tcp oob component, it's pretty close
to impossible to start the tcp btl, so if you only have IPv6 on your
machines, that will dictate starting with the tcp oob component.
Otherwise, you could start with either component (although both will
eventually need to be updated).
The oob tcp component (code is in orte/mca/oob/tcp/) is fairly
straight-forward, especially if all you need to deal with is
connection setup. There are really two pieces to be aware of - in
oob_tcp.c there is some code dealing with uri strings - this is used
by the upper layers to ask the oob component for it's contact address
(as a uri string) and to give the oob component a uri string and
associate it with an orte_process_name. The peer connection code is
in a combination of oob_tcp_peer.[h,c] and oob_tcp_addr.[h,c]. I'm
sure you will have to modify oob_tcp_addr.[h,c], and I think you'll
probably have to modify oob_tcp_peer.[c,h] as well.
I should diverge for a second... Every process in Open MPI has an
orte_process_name. This value will be unique between processes that
can connect to each other. When I want to send an out of band
message to a remote host, I send to that orte_process_name and the
communication layer figures out how to get the message over there.
So if the upper layers associate an orte_process_name with a uri
string, you'll use that information to contact that
orte_process_name, should you ever need to send data that way.
The tcp btl is mostly the same type of thing. The main difference is
how peers are setup. Instead of a char string to share endpoint
connections, we have what we call the "modex". This is basically a
one-time write, many time read global data store. So the tcp btl
puts a fixed-size structure into the modex data (behind the scenes,
this is stored in our gpr data store), and each process in the
universe can get that data by looking up against it's process name
(actually, in this case, it's a datastructure called the ompi_proc,
which is an orte_process_name, plus data needed for each MPI
process). So we'd need to extend that datastructure out a little bit
to be able to support either IPv4 or IPv6 addresses. From there, it
would be the usual changes during connection setup issue.
This was a fairly simple overview - I'd recommend starting with the
tcp oob component and asking when you have questions about what you
see. You don't need to run Open MPI jobs to test the tcp oob
component - you can just use orterun to launch normal old unix
commands. Something with a bit of stdio output will give a
reasonable first test of the oob. I usually do something like:
orterun -np 2 -host host_a,host_b ls -l $HOME
as I have enough files in my home directory that a page or two of
standard I/O should be returned.
Hope this helps,
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/