On Tue, 15 May 2007, Cliff White wrote: > Kilian CAVALOTTI wrote: > >On Friday 04 May 2007 12:57:31 pm Don Bahls wrote: > >>I'm wondering if there a standard way to test Lustre status from > >>a client that won't lock up when there is an issue with Lustre. > > > >You can see the devices state in /proc/fs/lustre/devices, this should be > >available even if the filesystem has problems. > > > >Cheers, > You can: > - cat /proc/mounts and check for your mount point -this will not hang > when servers are down > - cat /proc/fs/lustre/health_check - check for the string 'healthy' > - check /proc/fs/lustre/devices and look for 'UP'
Don,
Here's a simple nagios plugin adapted from the CFS Mon script, that we use
for monitoring lustre health.
<begin check_lustre>
#!/usr/bin/perl -T
#
# $Id: check_lustre 131 2007-04-16 13:37:26Z bret $
# nagios plugin to check health of lustre filesystem:
# healthy: return 0
# otherwise: return 2 (Critical)
use strict;
my $health_check = '/proc/fs/lustre/health_check';
# this is based on the lustre Mon monitor from the lustre manual
# 1) lustre modules should be loaded
# Is the lustre module check necessary, since /proc/fs/lustre is already
# being checked?
# 2) lustre kernel directory should exist
if ( ! -d "/proc/fs/lustre" ) {
print "no lustre kernel proc directory\n";
exit 2;
}
# 3) health check must pass
open ( HEALTH, "< $health_check" ) or exit 2;
while ( <HEALTH> ) {
if ( /^healthy$/ ) {
print "healthy\n";
exit 0;
} else {
print $_;
while ( <HEALTH> ) {
print $_;
}
exit 2;
}
}
<end check_lustre>
>
> cliffw
bret
pgp08bMEu7UDY.pgp
Description: PGP signature
_______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
