Is there a good way to detect a down MDS or OST from a
Lustre client? We have a basic sanity check which runs
at the start of jobs that does a test like this (ksh syntax);
# check to see if the Lustre filesystem is mounted before running.
# ( where /wrkdir is a Lustre Filesystem )
if [ ! -d /wrkdir/$USER ]; then
# note the error and requeue the job.
fi
We had an issue with our Lustre filesystem a few days
which caused several of the OSTs to go down. Unfortunately,
the test above was simply hanging on the directory test when
this happened, which caused the script to timeout.
I'm wondering if there a standard way to test Lustre status from
a client that won't lock up when there is an issue with Lustre.
Thanks,
Don
Donald Bahls | HPC Specialist | Arctic Region Supercomputing Center
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss