Hi, yesterday we had another bad expierance that has happened a couple of other
times. We have an AFS file server, running AIX 325 with AFS 3.3a, it goes into
a mode where:
- fs checks - reports ALL servers are running
- users with volumes on the dead server are dead
- they get NO repsonse from their aixterms
- we norrow the problem down, by logging into ALL the servers
- the one that does NOT allow us in, is always the one with the problem
The way we have to fix the problem, without rebooting the server is:
- bos shutdown <server> -localauth
- all processes SHUTDOWN - EXCEPT for the 'Instance fs'
- we have to KILL the 'fileserver' process via
- bos exec <server> "(process command)" -localauth
- bos exec <server> "kill -9 PID #" -localauth
- bos restart <server> -all -localauth
- after the server salvages, all is fine again
Here's the last 4 lines of the '/usr/afs/logs/FileLog' at time the last
incident happened:
----start log---
Tue Oct 15 10:17:47 1996 There are 1636 connections, process size 529279
Tue Oct 15 10:17:47 1996 There are 371 workstations, 241 are active (req in < 15
mins), 0 marked "down"
Tue Oct 15 10:17:47 1996 VShutdown: shutting down on-line volumes...
Tue Oct 15 10:19:15 1996 CB: RCallBack (zero fid probe in host.c) failed for host
IP_ADDRESS_DELETED.7001
----end log---
Would appreciate anyone's help or suggestions to help find the the root of the
problem.
Thanks,
--
Sal