On Tue, 15 May 2007, Cliff White wrote:

> Kilian CAVALOTTI wrote:
> >On Friday 04 May 2007 12:57:31 pm Don Bahls wrote:
> >>I'm wondering if there a standard way to test Lustre status from
> >>a client that won't lock up when there is an issue with Lustre.
> >
> >You can see the devices state in /proc/fs/lustre/devices, this should be 
> >available even if the filesystem has problems.
> >
> >Cheers,
> You can:
> - cat /proc/mounts and check for your mount point -this will not hang 
> when servers are down
> - cat /proc/fs/lustre/health_check - check for the string 'healthy'
> - check /proc/fs/lustre/devices and look for 'UP'

Don,

Here's a simple nagios plugin adapted from the CFS Mon script, that we use
for monitoring lustre health. 

<begin check_lustre>

#!/usr/bin/perl -T
#
# $Id: check_lustre 131 2007-04-16 13:37:26Z bret $

# nagios plugin to check health of lustre filesystem:
#     healthy:   return 0
#     otherwise: return 2 (Critical)

use strict;

my $health_check = '/proc/fs/lustre/health_check';

# this is based on the lustre Mon monitor from the lustre manual

# 1) lustre modules should be loaded

# Is the lustre module check necessary, since /proc/fs/lustre is already 
# being checked?

# 2) lustre kernel directory should exist

if ( ! -d "/proc/fs/lustre" ) {
    print "no lustre kernel proc directory\n";
    exit 2;
}

# 3) health check must pass

open ( HEALTH, "< $health_check" ) or exit 2;

while ( <HEALTH> ) {
    if ( /^healthy$/ ) {
        print "healthy\n";
        exit 0;
    } else {
        print $_;
        while ( <HEALTH> ) {
            print $_;
        }
        exit 2;
    }
}

<end check_lustre>

> 
> cliffw

bret

Attachment: pgp08bMEu7UDY.pgp
Description: PGP signature

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to