On Fri, 2009-12-11 at 02:01 +1100, Atul Vidwansa wrote: > > When I reboot MDS nodes and start MDTs with "service > heartbeat start" simultaneously on both mds nodes, sometimes I get > following message:
With both nodes up and running at the same time, likely they have both done a vgscan; vgchange -a y on the shared disk(s). I don't know that this is in itself a problem. I do the same thing here and I have not (yet) seen any ill effects. I am far from an LVM expert however. > mds1: 2009/12/10_13:48:08 CRITICAL: Resource LVM::mgsvg is active, and > should not be! > mds1: 2009/12/10_13:48:08 CRITICAL: Non-idle resources can affect data > integrity! I wonder how it's determining that LVM:mgsvg is "active" and what it considers "active". A look into the source for that would most likely be very fruitful. And it was. It seems that "/usr/lib/ocf/resource.d/heartbeat/LVM status" is what is used to determine who owns the resource. The LVM resource script does that with a: vgdisplay [-v if lvm version is >= 2 ] $volume 2>&1 | grep -i 'Status[ \t]*available' What is interesting is on my LVM 2 system, vgdisplay with -v also shows a: LV Status available for every volume in the VG. I wonder if they are just not accounting for that. Or maybe that's what they are looking for given that on my active and in use LVM system here, for the VG itself, Status shows: VG Status resizable So they can't be looking for an "available" in the VG Status for "resource ownership" and must want the LV Status line(s). Looking a little further, the LVM script has both "start" and "stop" actions which presumably heartbeat invokes to (dis-)"own" a resource. These two actions do: vgscan; vgchange -a y $1 and vgchange -a n $1 respectively. That implies that heartbeat wants to own an entire VG or nothing. It would appear you cannot have multiple volumes from a single VG owned by different nodes. As I said, I do this myself and have found no issues, but am not at all a heavy, or what I would call "production" user. > and heartbeat on both mds nodes does not start any resource (even after > waiting for 35 minutes). Well, it would seem that heartbeat has found a condition it considers dangerous and stopping there so as not to cause any damage. From the looks of things, you will need to disable the operating system's LVM startup code and leave it to heartbeat manage, if you buy into their assumptions. Might be worth a question or two on the LVM list to see if the assumptions are valid or not -- or resign yourself to allowing heartbeat to operate LVM resource ownership at the VG level and not LV level. Cheers, b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
