I've setup a simple (well, I copied it from someone else and modified it) to
monitor stale NFS mounts. Some preliminary testing seemed to go okay but
this problem crept up on me this weekend. The script is as follows:
#!/usr/bin/perl
if (@ARGV < 1) {
print "Usage:\n";
print "$0 <file to check with absolute path>\n";
exit 1;
}
eval {
local $SIG{ALRM} = sub {die "alarm\n"};
alarm 2;
$test = `ls @ARGV[0]`;
alarm 0;
};
if ($@) {
die unless $@ eq "alarm\n";
# Timed out - error
exit 1;
} else {
# Okay
exit 0;
}
However, on the machine that experienced the problem `ps aux` showed several
dead processes. Shouldn't the alarm() function have exited? I call the
script as follows:
# new_nfs_check.pl /opt/auctions/config/.DoNotDelete
The `ps aux` output follows:
202 22634 0.0 0.0 2752 732 pts/0 D Nov02 0:00 ls
/opt/auctions/config/.DoNotDelete
202 23728 0.0 0.0 2752 732 pts/0 D Nov02 0:00 ls
/opt/auctions/config/.DoNotDelete
202 24745 0.0 0.0 2752 732 pts/0 D Nov02 0:00 ls
/opt/auctions/config/.DoNotDelete
202 26054 0.0 0.0 2748 728 pts/0 D 00:08 0:00 ls
/opt/auctions/config/.DoNotDelete
202 26959 0.0 0.0 2748 728 pts/0 D 00:18 0:00 ls
/opt/auctions/config/.DoNotDelete
202 27742 0.0 0.0 2752 732 pts/0 D 00:28 0:00 ls
/opt/auctions/config/.DoNotDelete
202 28748 0.0 0.0 2748 728 pts/0 D 00:38 0:00 ls
/opt/auctions/config/.DoNotDelete
202 29767 0.0 0.0 2748 728 pts/0 D 00:48 0:00 ls
/opt/auctions/config/.DoNotDelete
202 30410 0.0 0.0 2748 728 pts/0 D 00:58 0:00 ls
/opt/auctions/config/.DoNotDelete
202 31508 0.0 0.0 2748 728 pts/0 D 01:08 0:00 ls
/opt/auctions/config/.DoNotDelete
202 31635 0.0 0.0 2748 728 pts/0 D 01:10 0:00 ls
/opt/auctions/config/.DoNotDelete
202 31648 0.0 0.0 2752 732 pts/0 D 01:10 0:00 ls
/opt/auctions/config/.DoNotDelete
Anyone have ideas? This node is a SLES9 box (which, as I understand has
issues with either the kernel or nfs-utils).
Matt