On 3/14/12 9:20 AM, emmanuel segura wrote: > Hello William > > i did new you are using drbd and i dont't know what type of configuration > you using > > But it's better you try to start clvm with clvmd -d > > like thak we can see what it's the problem
For what it's worth, here's the output of running clvmd -d on the node that stays up: <http://pastebin.com/sWjaxAEF> What's probably important in that big mass of output are the last two lines. Up to that point, I have both nodes up and running cman + clvmd; cluster.conf is here: <http://pastebin.com/w5XNYyAX> At the time of the next-to-the-last line, I cut power to the other node. At the time of the last line, I run "vgdisplay" on the remaining node, which hangs forever. After a lot of web searching, I found that I'm not the only one with this problem. Here's one case that doesn't seem relevant to me, since I don't use qdisk: <http://www.redhat.com/archives/linux-cluster/2007-October/msg00212.html>. Here's one with the same problem with the same OS: <http://bugs.centos.org/view.php?id=5229>, but with no resolution. Out of curiosity, has anyone on this list made a two-node cman+clvmd cluster work for them? > Il giorno 14 marzo 2012 14:02, William Seligman <[email protected] >> ha scritto: > >> On 3/14/12 6:02 AM, emmanuel segura wrote: >> >> I think it's better you make clvmd start at boot >>> >>> chkconfig cman on ; chkconfig clvmd on >>> >> >> I've already tried it. It doesn't work. The problem is that my LVM >> information is on the drbd. If I start up clvmd before drbd, it won't find >> the logical volumes. >> >> I also don't see why that would make a difference (although this could be >> part of the confusion): a service is a service. I've tried starting up >> clvmd inside and outside pacemaker control, with the same problem. Why >> would starting clvmd at boot make a difference? >> >> Il giorno 13 marzo 2012 23:29, William Seligman<seligman@nevis.** >>> columbia.edu <[email protected]> >>> >>>> ha scritto: >>>> >>> >>> On 3/13/12 5:50 PM, emmanuel segura wrote: >>>> >>>> So if you using cman why you use lsb::clvmd >>>>> >>>>> I think you are very confused >>>>> >>>> >>>> I don't dispute that I may be very confused! >>>> >>>> However, from what I can tell, I still need to run clvmd even if >>>> I'm running cman (I'm not using rgmanager). If I just run cman, >>>> gfs2 and any other form of mount fails. If I run cman, then clvmd, >>>> then gfs2, everything behaves normally. >>>> >>>> Going by these instructions: >>>> >>>> <https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial<https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial> >>>>> >>>> >>>> the resources he puts under "cluster control" (rgmanager) I have to >>>> put under pacemaker control. Those include drbd, clvmd, and gfs2. >>>> >>>> The difference between what I've got, and what's in "Clusters From >>>> Scratch", is in CFS they assign one DRBD volume to a single >>>> filesystem. I create an LVM physical volume on my DRBD resource, >>>> as in the above tutorial, and so I have to start clvmd or the >>>> logical volumes in the DRBD partition won't be recognized.>> Is >>>> there some way to get logical volumes recognized automatically by >>>> cman without rgmanager that I've missed? >>>> >>> >>> Il giorno 13 marzo 2012 22:42, William Seligman< >>>>> >>>> [email protected] >>>> >>>>> ha scritto: >>>>>> >>>>> >>>>> On 3/13/12 12:29 PM, William Seligman wrote: >>>>>> >>>>>>> I'm not sure if this is a "Linux-HA" question; please direct >>>>>>> me to the appropriate list if it's not. >>>>>>> >>>>>>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as >>>>>>> described in "Clusters From Scratch." Fencing is through >>>>>>> forcibly rebooting a node by cutting and restoring its power >>>>>>> via UPS. >>>>>>> >>>>>>> My fencing/failover tests have revealed a problem. If I >>>>>>> gracefully turn off one node ("crm node standby"; "service >>>>>>> pacemaker stop"; "shutdown -r now") all the resources >>>>>>> transfer to the other node with no problems. If I cut power >>>>>>> to one node (as would happen if it were fenced), the >>>>>>> lsb::clvmd resource on the remaining node eventually fails. >>>>>>> Since all the other resources depend on clvmd, all the >>>>>>> resources on the remaining node stop and the cluster is left >>>>>>> with nothing running. >>>>>>> >>>>>>> I've traced why the lsb::clvmd fails: The monitor/status >>>>>>> command includes "vgdisplay", which hangs indefinitely. >>>>>>> Therefore the monitor will always time-out. >>>>>>> >>>>>>> So this isn't a problem with pacemaker, but with clvmd/dlm: >>>>>>> If a node is cut off, the cluster isn't handling it properly. >>>>>>> Has anyone on this list seen this before? Any ideas? >>>>>>> >>>>>>>> Details: >>>>> >>>>>> >>>>>>> versions: >>>>>>> Redhat Linux 6.2 (kernel 2.6.32) >>>>>>> cman-3.0.12.1 >>>>>>> corosync-1.4.1 >>>>>>> pacemaker-1.1.6 >>>>>>> lvm2-2.02.87 >>>>>>> lvm2-cluster-2.02.87 >>>>>>> >>>>>> >>>>>> This may be a Linux-HA question after all! >>>>>> >>>>>> I ran a few more tests. Here's the output from a typical test of >>>>>> >>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**" /var/log/messages >>>>>> >>>>>> <http://pastebin.com/uqC6bc1b> >>>>>> >>>>>> It looks like what's happening is that the fence agent (one I >>>>>> wrote) is not returning the proper error code when a node >>>>>> crashes. According to this page, if a fencing agent fails GFS2 >>>>>> will freeze to protect the data: >>>>>> >>>>>> <http://docs.redhat.com/docs/**en-US/Red_Hat_Enterprise_** >>>>>> Linux/6/html/Global_File_**System_2/s1-gfs2hand-allnodes.**html<http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html>> >>>>>> >>>>>> As a test, I tried to fence my test node via standard means: >>>>>> >>>>>> stonith_admin -F >>>>>> orestes-corosync.nevis.**columbia.edu<http://orestes-corosync.nevis.columbia.edu> >>>>>> >>>>>> These were the log messages, which show that stonith_admin did >>>>>> its job and CMAN was notified of the >>>>>> fencing:<http://pastebin.com/**jaH820Bv <http://pastebin.com/jaH820Bv> >>>>>>> . >>>>>> >>>>>> Unfortunately, I still got the gfs2 freeze, so this is not the >>>>>> complete story. >>>>>> >>>>>> First things first. I vaguely recall a web page that went over >>>>>> the STONITH return codes, but I can't locate it again. Is there >>>>>> any reference to the return codes expected from a fencing >>>>>> agent, perhaps as function of the state of the fencing device? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://[email protected] PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
