Hi, yesterday we had another bad expierance that has happened a couple of other
times. We have an AFS file server, running AIX 325 with AFS 3.3a, it goes into
a mode where:

   - fs checks - reports ALL servers are running
   - users with volumes on the dead server are dead
     - they get NO repsonse from their aixterms
   - we norrow the problem down, by logging into ALL the servers
     - the one that does NOT allow us in, is always the one with the problem

The way we have to fix the problem, without rebooting the server is:

   - bos shutdown <server> -localauth
     - all processes SHUTDOWN - EXCEPT for the 'Instance fs'
       - we have to KILL the 'fileserver' process via
         - bos exec <server> "(process command)" -localauth
         - bos exec <server> "kill -9 PID #" -localauth
   - bos restart <server> -all -localauth
   - after the server salvages, all is fine again

Here's the last 4 lines of the '/usr/afs/logs/FileLog' at time the last
incident happened:

----start log---
Tue Oct 15 10:17:47 1996 There are 1636 connections, process size 529279
Tue Oct 15 10:17:47 1996 There are 371 workstations, 241 are active (req in < 15
mins), 0 marked "down"
Tue Oct 15 10:17:47 1996 VShutdown:  shutting down on-line volumes...
Tue Oct 15 10:19:15 1996 CB: RCallBack (zero fid probe in host.c) failed for host
IP_ADDRESS_DELETED.7001
----end log---

Would appreciate anyone's help or suggestions to help find the the root of the
problem.

Thanks,
--
Sal

Reply via email to