Jagga Soorma wrote: > Hi Guys, > > My MDT is setup with LVM and I was able to test failover based on the > Volume Group failing on my MDS (by unplugging both fibre cables). > However, for my OST's, I have created filesystems directly on the SAN > luns and when I unplug the fibre cables on my OSS, heartbeat does not > detect failure for the filesystem since it shows as mounted. Is there > somehow we can trigger a failure based on multipath failing on the OSS? >
Hi- It would depend on the version of heartbeat you are using. Heartbeat v1 did not do any resource level monitoring and if that is what you are using you are out of luck. If using v2 CRM and/or Pacemaker, you have two options: 1, Modify the Filesystem OCF script's monitor operation to check the actual health of the filesystem and/or multipath in addition to the status of the mount and return accordingly. The Filesystem OCF agent is located at /usr/lib/ocf/resource.d/heartbeat/Filesystem 2, Create your own resource agent that interacts with dm/multipath to start/stop/monitor it. Then constrain the resource to start before/stop after and run with the Filesystem resource. Then the filesystem will be dependent on the health of the multipath resource. I recommend the second for the sake of thoroughness. Including multipath monitoring in the Filesystem OCF may "just work" but leaves room for other multipath related failures going unnoticed. Writing your own OCF is fairly straight forward and is documented somewhere on www.clusterlabs.org. There is an OCF script that does the same for LVM which would serve as a good example of what needs to be done. Or maybe someone else has already created one? Linux-HA or Pacemaker lists might be a good place to ask. Good luck -- : Adam Gandelman : LINBIT | Your Way to High Availability : : http://www.linbit.com _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
