On Jan 15, 2009 11:38 -0800, Jeffrey Alan Bennett wrote: > I am using heartbeat V2. It works as expected, I just had to tune some > time outs, but it still takes around 3 minutes to totally move the MGS/MDS > services to the other system.
This is largely an issue of the Lustre failover itself, and not the HA software. The problem today is that under heavy load the clients may have to wait a long time for any requests sent to the server to complete (100s of seconds in some cases), so it is difficult for the clients to distinguish between server death (unlikely) and heavy server load (common). In the case where a server dies and fails over, the clients have to wait for their requests to time out, then they resend and wait again (in the common case the server is just overloaded), then finally they try to contact any other server listed as failover for that node. What we are looking to do for improving failover speed is to have the backup server broadcast to the clients that it has taken over the OST/MDT when it has started. Then the clients will be able to do failover to the new server as soon as it is ready, instead of waiting for the original requests to time out. > My biggest concern is that I can't control the situation in which > the HBA connectivity with the storage system is damaged, ie: I pull the > cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS > services keep running, they are still mounted and therefore heartbeat > does nothing. From the heartbeat "documentation" it does not seem that > this can be done, at least easily?. I read something about HBA ping and > it seems it requires HBAAPI which does not work with Brocade HBAs... You can use HBA multi-pathing to avoid this problem, if your hardware supports it. You can also use /proc/fs/lustre/health_check to check if the filesystems have encountered errors and are marked "unhealthy". Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
