Hello Transarc,
I am the AFS cell admin at CMU-ECE. We recently upgraded our
fileserver to Ultrix 4.3 and in the process uncovered potential problems with
our current AFS tree implementation. For whatever reasons, we clone many of
our system volumes on multiple servers. One motive, I was told, was to
"spread out" the use of these volumes, thus minimizing individual hardships
when a server crashes, is heavily loaded, or is taken off-line.
Unfortunately, this is not the case. In fact, by my calcuations our users are
potentially WORSE off.
If a server remains on-line and I get rid of one of the system clones,
the clients seem to find new ones ok. A nudge from fs checkvol will sometimes
do the trick. However, if a server is taken off line (either through
catastrophic failure, or just a simple bos shutdown), the clients cannot
re-assign to another clone of a volume that resided on that server.
HELP!!!! Is there any way to force clients to re-evaluate their
situation in the face of server downtime? If not we are in BIG trouble with
our current cloning scheme. It seems that something radical like flushing the
entire cache for the client would to the trick, but this seems so drastic and
inefficient for hundreds of clients.
The problem is related to "depth" of a volume in a cell. The deeper
the volume is in the cell tree, the higher the probablility for failure to
access the volume given a single server shutdown. This is because clients can
find out where parent volumes are all over the cell, so that if any one of the
parent volumes is lost we cannot access child volumes, even if the server that
the child resides on is up.
Please let me know if you require additional info. Thank you for your
attention to this matter.
Sincerely,
Kris Webb
Research Systems Programmer
Dept. of Elec. and Comp. Engineering
Carnegie Mellon University
(412) 268-5141
[EMAIL PROTECTED]