On Jan 15, 2009 17:51 -0800, Jeffrey Alan Bennett wrote: > > You can use HBA multi-pathing to avoid this problem, if your > > hardware supports it. You can also use > > /proc/fs/lustre/health_check to check if the filesystems have > > encountered errors and are marked "unhealthy". > > We use multipath in all our configurations. However, will Lustre > be able to detect if the connectivity to the storage has been > totally lost ( ie. no available path ) and display accordingly on > /proc/fs/lustre/health_check?
Yes, but it can currently only do this "reactively" instead of "proactively". If you are using MMP then it should detect the IO error and mark the filesystem read-only within a second or so (depending on how quickly the SCSI layer returns the error vs. retrying), which will in turn cause health_check to return "unhealthy". However, if there is other filesystem IO going on that will also generate an IO error that will be returned to the client. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
