Re: Some questions on multi-homed AFS machines

W. Phillip Moore Thu, 9 May 1996 16:58:48 GMT
>>>>> "Christer" == Christer Bernerus <[EMAIL PROTECTED]> writes:

Christer> Transarc says that only plain AFS fileservers work properly.

Out of the box, this is true.  Multihomed clients and database server
can be made to work only if you play some very careful games with
routing tables and hostname configurations.

Christer> Q1: Why doesn't multi-homed AFS clients work ?

If a multihomed client uses more than one IP address to communicate
with a fileserver, then it will appear to the fileserver as more than
one host.  The IP address is assumed to be a unique identifier for a
client.  The callback mechanism fails miserably, and the effect on a
client whose routing tbales flop over to use a different IP address is
random cache inconsistency and/or corruption.

Christer> Q2: If I have a multi homed AFS fileserver, I cannot run
Christer> afsd on it righ= t ?

Maybe not.  See above.

Christer> Q3: Would it work if I have a secondary server backbone
Christer> network that uses static routes ?

Yes.  If you can guarantee that the ONLY interface that will ever be
used to communicate with the fileservers is the primary interface (or
actually just one and only one of the interfaces on the machine,
although the primary probably makes sense), then AFS will work.

Christer> Q4: Could such a setup work even with database machines ?

Yes.  I've made it work here, and I have seen posts from other on
info-afs about making the same setup work elsewhere.

Database servers are a little more complex.  You have to guarantee
that the IP addresses listed in /usr/afs/etc/CellServDB are the only
ones the database server processes will ever "see".  

Christer> Q5: What work is going on to make multi-homed AFS machines -
Christer> other than file servers - work ?

Answers are basically given above.  Its not easy.

Now, at the risk of exposing myself to a promise I can't keep, let me
describe how we handled this here at Morgan Stanley.  The routing
table approach won't work in our environment for several reasons I
dont' feel like taking the time to document.  Basically, we have lots
of multihomed machines on which the above guarantees were impossible.

The solution we implemented was to develop a utility which modifies
the IP address in the UDP socket structure in the kernel after it has
been created by afsd.  Normally, most applications open a UDP socket
by specifying 0 or the macro IN_ADDR_ANY.  

Sockets creates like this will accept connections sent to ANY of the
host machines valid IP addresses and that particular UDP port.  It
also means than when sending packets using that port number, the From
address in the IP packet will be the IP address of the interface used
to send the packet.  

In the case of afsd and port 7001 on a multihomed client, the IP
address seen by the fileserver is just the IP address of the client
interface used to communicate with the fileserver, simply because afsd
opens UDP port 7001 with IN_ADDR_ANY.  

OK, I'm starting to ramble, so I'll get to the point.

We developed a utility call "ipedit" which finds the kernel address of
the UDP socket specified on the command line, and writes the given IP
address into.  This forces afsd (which is oblivious to the change, of
course) to generate outgoing IP packets which all use the given IP
address in the From header, regardless of the interface they use to
leave the client.

This guarantees that even on multihome clients, the fileservers will
only see ONE of their IP addresses, and thus, AFS client code works.
This doesn't provide you with any sort of redundancy if that one
interface fails.  We weren't trrying to make AFS work redundantly on
multihomed clients, we were just trying to make it work at all.

We use the same approach to also change the socket addresses of all
the database server processes as well.  The ipedit utility comes in
handy for a number of other non-AFS applications as well.

We've ported the ipedit utility to SunOS 4.1.3, Solaris 2.x, AIX 3.x
and 4.x, but nothing else thus far.

I don't beleive giving the code away is a big deal, but the only
problem I can see is that is looks an awful lot like the SunOS netstat
source, since that what we started with (netstat knew how to find the
socket addresses, so initially, we just hacked it scribble on them).
Re: Some questions on multi-homed AFS machines

Reply via email to