Nancy,
You indicated that you had RO copies of critical volumes on the two
(dead) rs6000 servers, and the "master sync-site" (ie RW version of
the volume) on one of the (living) sun servers. You did not mention
whether there were any RO versions of the critical volumes on either
of the sun servers.
This is a critical point. If ReadOnly volumes exist (note use
of term *exist* rather than *are available*), cache managers will
never utilize the ReadWrite version of the volume. The only way to
access the RW volume is via the "dot" path (or by special mounting).
This means if *all* RO copies are on dead servers, are offline, are
behind a network partition, etc, then clients will not be able to get
the data, even if the RW version of the volume is health, on a healthy
server and a healthy network.
There was a very long discussion on info-afs a few months ago as to
why we do not "fall back" on RW versions of a volume. I'm not going
to try to dredge up all the reasons - it was a very long discussion.
Perhaps someone else would care to summarize.
However - we do *very* strongly encourage keeping one RO copy of a
volume on the *same server and partition* as the RW. Two reasons for
this. First, the RO that is on the same server and partition as the
RW is a clone (just a copy of the header - not a full copy of each
file). It therefore is very small, but provides access to the same
set of files that all other (full copy) ReadOnly volume do. In
training we refer to this as the "cheap replica".
The second reason is to prevent the frustration that you have
experienced, in which all your ROs were unavailable, but a perfectly
healthy RW was accessible but not used. If you keep a cheap replica,
then by definiation, if the RW is available, one of the RO's is also
available, and clients will utilize that site.
Pierette VanRyzin
AFS Training
Transarc Corporation