On 3/15/12 11:50 AM, emmanuel segura wrote: > yes william > > Now try clvmd -d and see what happen > > locking_type = 3 it's lvm cluster lock type
Since you asked for confirmation, here it is: the output of 'clvmd -d' just now. <http://pastebin.com/bne8piEw>. I crashed the other node at Mar 15 12:02:35, when you see the only additional line of output. I don't see any particular difference between this and the previous result <http://pastebin.com/sWjaxAEF>, which suggests that I had cluster locking enabled before, and still do now. > Il giorno 15 marzo 2012 16:15, William Seligman <[email protected] >> ha scritto: > >> On 3/15/12 5:18 AM, emmanuel segura wrote: >> >>> The first thing i seen in your clvmd log it's this >>> >>> ============================================= >>> WARNING: Locking disabled. Be careful! This could corrupt your metadata. >>> ============================================= >> >> I saw that too, and thought the same as you did. I did some checks (see >> below), >> but some web searches suggest that this message is a normal consequence of >> clvmd >> initialization; e.g., >> >> <http://markmail.org/message/vmy53pcv52wu7ghx> >> >>> use this command >>> >>> lvmconf --enable-cluster >>> >>> and remember for cman+pacemaker you don't need qdisk >> >> Before I tried your lvmconf suggestion, here was my /etc/lvm/lvm.conf: >> <http://pastebin.com/841VZRzW> and the output of "lvm dumpconfig": >> <http://pastebin.com/rtw8c3Pf>. >> >> Then I did as you suggested, but with a check to see if anything changed: >> >> # cd /etc/lvm/ >> # cp lvm.conf lvm.conf.cluster >> # lvmconf --enable-cluster >> # diff lvm.conf lvm.conf.cluster >> # >> >> So the key lines have been there all along: >> locking_type = 3 >> fallback_to_local_locking = 0 >> >> >>> Il giorno 14 marzo 2012 23:17, William Seligman < >> [email protected] >>>> ha scritto: >>> >>>> On 3/14/12 9:20 AM, emmanuel segura wrote: >>>>> Hello William >>>>> >>>>> i did new you are using drbd and i dont't know what type of >> configuration >>>>> you using >>>>> >>>>> But it's better you try to start clvm with clvmd -d >>>>> >>>>> like thak we can see what it's the problem >>>> >>>> For what it's worth, here's the output of running clvmd -d on the node >> that >>>> stays up: <http://pastebin.com/sWjaxAEF> >>>> >>>> What's probably important in that big mass of output are the last two >>>> lines. Up >>>> to that point, I have both nodes up and running cman + clvmd; >> cluster.conf >>>> is >>>> here: <http://pastebin.com/w5XNYyAX> >>>> >>>> At the time of the next-to-the-last line, I cut power to the other node. >>>> >>>> At the time of the last line, I run "vgdisplay" on the remaining node, >>>> which >>>> hangs forever. >>>> >>>> After a lot of web searching, I found that I'm not the only one with >> this >>>> problem. Here's one case that doesn't seem relevant to me, since I don't >>>> use >>>> qdisk: >>>> < >> http://www.redhat.com/archives/linux-cluster/2007-October/msg00212.html>. >>>> Here's one with the same problem with the same OS: >>>> <http://bugs.centos.org/view.php?id=5229>, but with no resolution. >>>> >>>> Out of curiosity, has anyone on this list made a two-node cman+clvmd >>>> cluster >>>> work for them? >>>> >>>>> Il giorno 14 marzo 2012 14:02, William Seligman < >>>> [email protected] >>>>>> ha scritto: >>>>> >>>>>> On 3/14/12 6:02 AM, emmanuel segura wrote: >>>>>> >>>>>> I think it's better you make clvmd start at boot >>>>>>> >>>>>>> chkconfig cman on ; chkconfig clvmd on >>>>>>> >>>>>> >>>>>> I've already tried it. It doesn't work. The problem is that my LVM >>>>>> information is on the drbd. If I start up clvmd before drbd, it won't >>>> find >>>>>> the logical volumes. >>>>>> >>>>>> I also don't see why that would make a difference (although this could >>>> be >>>>>> part of the confusion): a service is a service. I've tried starting up >>>>>> clvmd inside and outside pacemaker control, with the same problem. Why >>>>>> would starting clvmd at boot make a difference? >>>>>> >>>>>> Il giorno 13 marzo 2012 23:29, William Seligman<seligman@nevis.** >>>>>>> columbia.edu <[email protected]> >>>>>>> >>>>>>>> ha scritto: >>>>>>>> >>>>>>> >>>>>>> On 3/13/12 5:50 PM, emmanuel segura wrote: >>>>>>>> >>>>>>>> So if you using cman why you use lsb::clvmd >>>>>>>>> >>>>>>>>> I think you are very confused >>>>>>>>> >>>>>>>> >>>>>>>> I don't dispute that I may be very confused! >>>>>>>> >>>>>>>> However, from what I can tell, I still need to run clvmd even if >>>>>>>> I'm running cman (I'm not using rgmanager). If I just run cman, >>>>>>>> gfs2 and any other form of mount fails. If I run cman, then clvmd, >>>>>>>> then gfs2, everything behaves normally. >>>>>>>> >>>>>>>> Going by these instructions: >>>>>>>> >>>>>>>> <https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial< >>>> https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial> >>>>>>>>> >>>>>>>> >>>>>>>> the resources he puts under "cluster control" (rgmanager) I have to >>>>>>>> put under pacemaker control. Those include drbd, clvmd, and gfs2. >>>>>>>> >>>>>>>> The difference between what I've got, and what's in "Clusters From >>>>>>>> Scratch", is in CFS they assign one DRBD volume to a single >>>>>>>> filesystem. I create an LVM physical volume on my DRBD resource, >>>>>>>> as in the above tutorial, and so I have to start clvmd or the >>>>>>>> logical volumes in the DRBD partition won't be recognized.>> Is >>>>>>>> there some way to get logical volumes recognized automatically by >>>>>>>> cman without rgmanager that I've missed? >>>>>>>> >>>>>>> >>>>>>> Il giorno 13 marzo 2012 22:42, William Seligman< >>>>>>>>> >>>>>>>> [email protected] >>>>>>>> >>>>>>>>> ha scritto: >>>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/13/12 12:29 PM, William Seligman wrote: >>>>>>>>>> >>>>>>>>>>> I'm not sure if this is a "Linux-HA" question; please direct >>>>>>>>>>> me to the appropriate list if it's not. >>>>>>>>>>> >>>>>>>>>>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as >>>>>>>>>>> described in "Clusters From Scratch." Fencing is through >>>>>>>>>>> forcibly rebooting a node by cutting and restoring its power >>>>>>>>>>> via UPS. >>>>>>>>>>> >>>>>>>>>>> My fencing/failover tests have revealed a problem. If I >>>>>>>>>>> gracefully turn off one node ("crm node standby"; "service >>>>>>>>>>> pacemaker stop"; "shutdown -r now") all the resources >>>>>>>>>>> transfer to the other node with no problems. If I cut power >>>>>>>>>>> to one node (as would happen if it were fenced), the >>>>>>>>>>> lsb::clvmd resource on the remaining node eventually fails. >>>>>>>>>>> Since all the other resources depend on clvmd, all the >>>>>>>>>>> resources on the remaining node stop and the cluster is left >>>>>>>>>>> with nothing running. >>>>>>>>>>> >>>>>>>>>>> I've traced why the lsb::clvmd fails: The monitor/status >>>>>>>>>>> command includes "vgdisplay", which hangs indefinitely. >>>>>>>>>>> Therefore the monitor will always time-out. >>>>>>>>>>> >>>>>>>>>>> So this isn't a problem with pacemaker, but with clvmd/dlm: >>>>>>>>>>> If a node is cut off, the cluster isn't handling it properly. >>>>>>>>>>> Has anyone on this list seen this before? Any ideas? >>>>>>>>>>> >>>>>>>>>>>> Details: >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> versions: >>>>>>>>>>> Redhat Linux 6.2 (kernel 2.6.32) >>>>>>>>>>> cman-3.0.12.1 >>>>>>>>>>> corosync-1.4.1 >>>>>>>>>>> pacemaker-1.1.6 >>>>>>>>>>> lvm2-2.02.87 >>>>>>>>>>> lvm2-cluster-2.02.87 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This may be a Linux-HA question after all! >>>>>>>>>> >>>>>>>>>> I ran a few more tests. Here's the output from a typical test of >>>>>>>>>> >>>>>>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**" /var/log/messages >>>>>>>>>> >>>>>>>>>> <http://pastebin.com/uqC6bc1b> >>>>>>>>>> >>>>>>>>>> It looks like what's happening is that the fence agent (one I >>>>>>>>>> wrote) is not returning the proper error code when a node >>>>>>>>>> crashes. According to this page, if a fencing agent fails GFS2 >>>>>>>>>> will freeze to protect the data: >>>>>>>>>> >>>>>>>>>> <http://docs.redhat.com/docs/**en-US/Red_Hat_Enterprise_** >>>>>>>>>> Linux/6/html/Global_File_**System_2/s1-gfs2hand-allnodes.**html< >>>> >> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html >>>>>> >>>>>>>>>> >>>>>>>>>> As a test, I tried to fence my test node via standard means: >>>>>>>>>> >>>>>>>>>> stonith_admin -F orestes-corosync.nevis.**columbia.edu< >>>> http://orestes-corosync.nevis.columbia.edu> >>>>>>>>>> >>>>>>>>>> These were the log messages, which show that stonith_admin did >>>>>>>>>> its job and CMAN was notified of the >>>>>>>>>> fencing:<http://pastebin.com/**jaH820Bv < >>>> http://pastebin.com/jaH820Bv> >>>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> Unfortunately, I still got the gfs2 freeze, so this is not the >>>>>>>>>> complete story. >>>>>>>>>> >>>>>>>>>> First things first. I vaguely recall a web page that went over >>>>>>>>>> the STONITH return codes, but I can't locate it again. Is there >>>>>>>>>> any reference to the return codes expected from a fencing >>>>>>>>>> agent, perhaps as function of the state of the fencing device? >> -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://[email protected] PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
