Pierette,
While your suggestions are helpful, your configuration suggestion would
still not have prevented our AFS clients failure since our site
had read-only copies of the AFS directory available when the rs6000 servers
went dead. At the time of the rs6000s failure our AFS cell had available
one server with read-only copies ( SUN server #2 ) and one server with
read-write copies ( master sync site SUN server #1). I could see the
RO available copy of the server when I executed "fs getserverprefs" from
the AFS server machines.
So while I will take your suggestion to put a read-only copy on SUN server #1
(read-write site) your solution would still not have prevented our
downtime since the problem of clients timing out before they could reach
the SUN server read-only copy would still have remained.
I would still like to know why there are no tools to EXCLUDE servers
>From the server preference list or minimally, why I cannot dynamically
configure the timeout value for contacting servers.
-Nancy Yeager
Message-Id: <[EMAIL PROTECTED]>
Date: Wed, 18 Aug 1993 11:36:56 -0400 (EDT)
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED], [EMAIL PROTECTED] (Nancy Yeager)
Subject: Re: Serverprefs for AFS clients
In-Reply-To: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
Status: R
Nancy,
You indicated that you had RO copies of critical volumes on the two
(dead) rs6000 servers, and the "master sync-site" (ie RW version of
the volume) on one of the (living) sun servers. You did not mention
whether there were any RO versions of the critical volumes on either
of the sun servers.
This is a critical point. If ReadOnly volumes exist (note use
of term *exist* rather than *are available*), cache managers will
never utilize the ReadWrite version of the volume. The only way to
access the RW volume is via the "dot" path (or by special mounting).
This means if *all* RO copies are on dead servers, are offline, are
behind a network partition, etc, then clients will not be able to get
the data, even if the RW version of the volume is health, on a healthy
server and a healthy network.
There was a very long discussion on info-afs a few months ago as to
why we do not "fall back" on RW versions of a volume. I'm not going
to try to dredge up all the reasons - it was a very long discussion.
Perhaps someone else would care to summarize.
However - we do *very* strongly encourage keeping one RO copy of a
volume on the *same server and partition* as the RW. Two reasons for
this. First, the RO that is on the same server and partition as the
RW is a clone (just a copy of the header - not a full copy of each
file). It therefore is very small, but provides access to the same
set of files that all other (full copy) ReadOnly volume do. In
training we refer to this as the "cheap replica".
The second reason is to prevent the frustration that you have
experienced, in which all your ROs were unavailable, but a perfectly
healthy RW was accessible but not used. If you keep a cheap replica,
then by definiation, if the RW is available, one of the RO's is also
available, and clients will utilize that site.
Pierette VanRyzin
AFS Training
Transarc Corporation