On Fri, 31 Mar 2006, Christian Kauhaus wrote:
So the resolver already does the complicated work for us, since it returns all addresses associated to a given target (hostname or IP-addr notation) in the order of decreasing preference.
What you propose here should work for the case of a single BTL that handles both IPv4 and IPv6. How about the case of 2 BTLs ? (as it's not clear to me from the rest of the discussion if one solution is better than the other)
Now we also have two different protocols on each interface.
This could theoretically happen in other situations as well. For example, it's possible to set up TCP/IPv4 (and I guess even v6) over Myrinet at the same time with GM over Myrinet, which also brings it to 2 (or even 3) protocols over the same physical connection. So how should these situations be handled ? (this is more of a general question, not related to IPv6 implementation).
- We introduce another parameter, which allows an IP version selection both globally and on a per-interface basis. Something like: IPv4-only / prefer IPv4 / auto (resolver) / prefer IPv6 / IPv6-only The third approach would possibly the cleanest one.
I also like it, with emphasis on "both globally and per-interface".
Since it is standard behaviour for every IPv6 app to try all known addresses for the target host until any one succeeds, we are also able to connect to a IPv6-enabled host where the target daemon does not listen on a IPv6 interface.
Err, it's not OpenMPI, but the rsh/ssh client that tries the connections. My point however is a bit different as it also relates to the authentication behind the connection, where the IP (and therefore its flavour) which is used for making the connection counts:
- if you pass an IPv6 address to a non-IPv6 aware rsh/ssh client, the connection will fail. So the upper level which executes the rsh/ssh client would need to handle the fallback to the different addresses. OTOH, if the rsh/ssh client is IPv6 aware, it might already try them which will lead to an increased time to make (or decide that is not possible to make) the connection.
- if you try to connect over IPv6 with to a ssh daemon that only has hostbased auth. configured for IPv4 addresses (or viceversa), the hostbased auth. will fail (and most likely will ask for a password). This is quite likely to happen unknowingly if the distribution (like RHEL and Fedora) turn on IPv6 by default and assign link-local addresses, resulting in a working IPv6 configuration on the local network - which would apply to most clusters - without the admin having to do anything about it (guess how I found this out ;-))
For example, we ran several weeks without an IPv6-enabled rsh, which is used to handle MPI job startup on the cluster, without any problems.
What do you mean by "IPv6-enabled rsh" ? Was it the daemon, client or both ?
-- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de