Hi!

Using GT-4.0.6, we have a job with StageIn:

<fileStageIn>
<transfer>
<sourceUrl>gsiftp://racl00.inf-ra.uni-jena.de/bin/echo</sourceUrl>
<destinationUrl>file:///${GLOBUS_USER_HOME}/my_echo</destinationUrl>
</transfer>
</fileStageIn>

globusrun-ws complains about timeout for the StageIn:

Non-Extended 3rd-party transfer grid.inf-ra.uni-jena.de:2811//bin/echo
--> racl00.inf-ra.uni-jena.de:2811//home/racl/limmer/my_echo failed
[Caused by: Reply wait timeout. (error code 4)]

This is caused by gridftp, the logfile shows a hang after RETR and
cancelling the job after the timeout.

strace shows that racl00 announces its RFC1918 address 192.168.1.12, but
the remote end cannot reach this IP due to network configuration.

On the other hand, globus-url-copy uses passive transfer and everything
is fine.

Google revealed that one should not put private IPs in /etc/hosts when a
machine has multiple interfaces. Anyway, we do have such a line and
we're going to keep it, it enables local cluster traffic to be routed
over the internal network.

So I changed nsswitch.conf to prefer dns over files, but then we end up
with IPv6 addresses in the hostname (GSI complaining about something
like "expected host/racl00.inf-ra.uni-jena.de, got 2001:638:c:a00e::2")

However, changing nsswitch.conf isn't the solution, modifying /etc/hosts
isn't the solution. Using passive gsiftp would be a solution, but after
all, I believe the GT code is wrong.

I'd like to have a runtime config switch to suppress the announcement of
RFC1918 addresses. These addresses are the root of all evil. Skipping
those addresses is pretty easy, just loop over getaddrinfo()'s result
list and drop them:

      if ((htonl(0x0a000000) == (inaddr->sin_addr.s_addr &
                            opal_net_prefix2netmask(8)))  ||
          (htonl(0xac100000) == (inaddr->sin_addr.s_addr &
                            opal_net_prefix2netmask(12))) ||
          (htonl(0xc0a80000) == (inaddr->sin_addr.s_addr &
                            opal_net_prefix2netmask(16))) ||
          (htonl(0xa9fe0000) == (inaddr->sin_addr.s_addr &
                            opal_net_prefix2netmask(16)))) {
        /* skip */
        }

where opal_net_prefix2netmask is

uint32_t opal_net_prefix2netmask(uint32_t prefixlen)
{
    return htonl (((1 << prefixlen) - 1) << (32 - prefixlen));
}

(shamelessly copied from Open MPI)


I don't know how GT4 obtains the local addresses, if it employs Java or
C, gethostname() and gethostbyname() (not IPv6 capable), getaddrinfo()
(preferred, at least for C) or via local interface discovery (probably
not, as SIOCGIFADDRS only works for IPv4 and there's no portable way for
IPv6)...

To sum things up: Can you add a runtime switch for passive ftp and/or a
runtime switch for dropping RFC1918+(169.254/16, see RFC3330) addresses?
This would remove the burden from messing around with /etc/hosts and all
the problems related to RFC1918 addresses getting leaked into public
networks.

Or are there other solutions?


TIA

-- 
mail: [EMAIL PROTECTED]         http://adi.thur.de      PGP/GPG: key via 
keyserver

Reply via email to