On Tue, 14 Jul 1998, Nathan Rawling wrote:
> 1) Server with bad database copies, or no database copies elected as sync
> site. I've heard unofficial remarks from Transarc that a DB server
> with no databases will not allow itself to become the sync site. With
> that said, I'm not sure I'd risk it. Before you touch anything, make
> sure the recovery state (from udebug) is 1f.
Who is sync site is irrelevant. Every election is followed by recovery,
and one of the recovery steps is for the sync site to query every server,
find out who has the most recent database version, and then fetch that
database and distribute it to the remaining servers. A machine with no
database (sync site or not) should not be a problem. A machine with a
corrupt database that appears to be the most current will be a problem
regardless of whether that machine ever becomes sync site.
> 2) Deadlocked cell election. Make sure that at any given time, the servers
> you want running and participating in the cell are the only ones in the
> /usr/afs/etc/CellServDB file. Typos can really hurt you here by
> prolonging your election indefinitely.
Absolutely true. The server-side CellServDB must contain exactly the set
of servers that are database servers, and no others. Anything else will
give you extremely strange behaviour.
> 3) AFS clients don't have any valid DB servers. This happened to me,
> it wasn't pretty, don't let it happen to you. It doesn't really hurt
> the clients any to have CellServDB files included servers that aren't
> there.
>
> 4) AFS clients CellServDB files don't contain the sync-site. The exact
> results of this escape me at the moment, but they're not pleasant.
If a client's CellServDB doesn't contain the sync site, it can't do any
operation that requires talking to the sync site. In general, add new
servers to the clients' CellServDB before bringing them up, and remove old
servers after shutting them down. This way you avoid both of the above
problems, and all you lose is a bit of performance.
I don't know how to avoid DOA hardware. If anyone does, let me know. :-)
-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]>
Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA