Ok William we can try to understand what happen when clvm hang
edit the /etc/lvm/lvm.conf and change level = 7 in the log session and uncomment this line file = "/var/log/lvm2.log" Il giorno 15 marzo 2012 20:50, William Seligman <[email protected] > ha scritto: > On 3/15/12 12:55 PM, emmanuel segura wrote: > > > I don't see any error and the answer for your question it's yes > > > > can you show me your /etc/cluster/cluster.conf and your crm configure > show > > > > like that more later i can try to look if i found some fix > > Thanks for taking a look. > > My cluster.conf: <http://pastebin.com/w5XNYyAX> > crm configure show: <http://pastebin.com/atVkXjkn> > > Before you spend a lot of time on the second file, remember that clvmd > will hang > whether or not I'm running pacemaker. > > > Il giorno 15 marzo 2012 17:42, William Seligman < > [email protected] > >> ha scritto: > > > >> On 3/15/12 12:15 PM, emmanuel segura wrote: > >> > >>> Ho did you created your volume group > >> > >> pvcreate /dev/drbd0 > >> vgcreate -c y ADMIN /dev/drbd0 > >> lvcreate -L 200G -n usr ADMIN # ... and so on > >> # "Nevis-HA" is the cluster name I used in cluster.conf > >> mkfs.gfs2 -p lock_dlm -j 2 -t Nevis_HA:usr /dev/ADMIN/usr # ... and so > on > >> > >>> give me the output of vgs command when the cluster it's up > >> > >> Here it is: > >> > >> Logging initialised at Thu Mar 15 12:40:39 2012 > >> Set umask from 0022 to 0077 > >> Finding all volume groups > >> Finding volume group "ROOT" > >> Finding volume group "ADMIN" > >> VG #PV #LV #SN Attr VSize VFree > >> ADMIN 1 5 0 wz--nc 2.61t 765.79g > >> ROOT 1 2 0 wz--n- 117.16g 0 > >> Wiping internal VG cache > >> > >> I assume the "c" in the ADMIN attributes means that clustering is turned > >> on? > >> > >>> Il giorno 15 marzo 2012 17:06, William Seligman < > >> [email protected] > >>>> ha scritto: > >>> > >>>> On 3/15/12 11:50 AM, emmanuel segura wrote: > >>>>> yes william > >>>>> > >>>>> Now try clvmd -d and see what happen > >>>>> > >>>>> locking_type = 3 it's lvm cluster lock type > >>>> > >>>> Since you asked for confirmation, here it is: the output of 'clvmd -d' > >>>> just now. <http://pastebin.com/bne8piEw>. I crashed the other node at > >>>> Mar 15 12:02:35, when you see the only additional line of output. > >>>> > >>>> I don't see any particular difference between this and the previous > >>>> result <http://pastebin.com/sWjaxAEF>, which suggests that I had > >>>> cluster locking enabled before, and still do now. > >>>> > >>>>> Il giorno 15 marzo 2012 16:15, William Seligman < > >>>> [email protected] > >>>>>> ha scritto: > >>>>> > >>>>>> On 3/15/12 5:18 AM, emmanuel segura wrote: > >>>>>> > >>>>>>> The first thing i seen in your clvmd log it's this > >>>>>>> > >>>>>>> ============================================= > >>>>>>> WARNING: Locking disabled. Be careful! This could corrupt your > metadata. > >>>>>>> ============================================= > >>>>>> > >>>>>> I saw that too, and thought the same as you did. I did some checks > >>>>>> (see below), but some web searches suggest that this message is a > >>>>>> normal consequence of clvmd initialization; e.g., > >>>>>> > >>>>>> <http://markmail.org/message/vmy53pcv52wu7ghx> > >>>>>> > >>>>>>> use this command > >>>>>>> > >>>>>>> lvmconf --enable-cluster > >>>>>>> > >>>>>>> and remember for cman+pacemaker you don't need qdisk > >>>>>> > >>>>>> Before I tried your lvmconf suggestion, here was my > /etc/lvm/lvm.conf: > >>>>>> <http://pastebin.com/841VZRzW> and the output of "lvm dumpconfig": > >>>>>> <http://pastebin.com/rtw8c3Pf>. > >>>>>> > >>>>>> Then I did as you suggested, but with a check to see if anything > >>>>>> changed: > >>>>>> > >>>>>> # cd /etc/lvm/ > >>>>>> # cp lvm.conf lvm.conf.cluster > >>>>>> # lvmconf --enable-cluster > >>>>>> # diff lvm.conf lvm.conf.cluster > >>>>>> # > >>>>>> > >>>>>> So the key lines have been there all along: > >>>>>> locking_type = 3 > >>>>>> fallback_to_local_locking = 0 > >>>>>> > >>>>>> > >>>>>>> Il giorno 14 marzo 2012 23:17, William Seligman < > >>>>>> [email protected] > >>>>>>>> ha scritto: > >>>>>>> > >>>>>>>> On 3/14/12 9:20 AM, emmanuel segura wrote: > >>>>>>>>> Hello William > >>>>>>>>> > >>>>>>>>> i did new you are using drbd and i dont't know what type of > >>>>>>>>> configuration you using > >>>>>>>>> > >>>>>>>>> But it's better you try to start clvm with clvmd -d > >>>>>>>>> > >>>>>>>>> like thak we can see what it's the problem > >>>>>>>> > >>>>>>>> For what it's worth, here's the output of running clvmd -d on > >>>>>>>> the node that stays up: <http://pastebin.com/sWjaxAEF> > >>>>>>>> > >>>>>>>> What's probably important in that big mass of output are the > >>>>>>>> last two lines. Up to that point, I have both nodes up and > >>>>>>>> running cman + clvmd; cluster.conf is here: > >>>>>>>> <http://pastebin.com/w5XNYyAX> > >>>>>>>> > >>>>>>>> At the time of the next-to-the-last line, I cut power to the > >>>>>>>> other node. > >>>>>>>> > >>>>>>>> At the time of the last line, I run "vgdisplay" on the > >>>>>>>> remaining node, which hangs forever. > >>>>>>>> > >>>>>>>> After a lot of web searching, I found that I'm not the only one > >>>>>>>> with this problem. Here's one case that doesn't seem relevant > >>>>>>>> to me, since I don't use qdisk: > >>>>>>>> < > http://www.redhat.com/archives/linux-cluster/2007-October/msg00212.html>. > >>>>>>>> Here's one with the same problem with the same OS: > >>>>>>>> <http://bugs.centos.org/view.php?id=5229>, but with no > resolution. > >>>>>>>> > >>>>>>>> Out of curiosity, has anyone on this list made a two-node > >>>>>>>> cman+clvmd cluster work for them? > >>>>>>>> > >>>>>>>>> Il giorno 14 marzo 2012 14:02, William Seligman < > >>>>>>>> [email protected] > >>>>>>>>>> ha scritto: > >>>>>>>>> > >>>>>>>>>> On 3/14/12 6:02 AM, emmanuel segura wrote: > >>>>>>>>>> > >>>>>>>>>> I think it's better you make clvmd start at boot > >>>>>>>>>>> > >>>>>>>>>>> chkconfig cman on ; chkconfig clvmd on > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I've already tried it. It doesn't work. The problem is that > >>>>>>>>>> my LVM information is on the drbd. If I start up clvmd > >>>>>>>>>> before drbd, it won't find the logical volumes. > >>>>>>>>>> > >>>>>>>>>> I also don't see why that would make a difference (although > >>>>>>>>>> this could be part of the confusion): a service is a > >>>>>>>>>> service. I've tried starting up clvmd inside and outside > >>>>>>>>>> pacemaker control, with the same problem. Why would > >>>>>>>>>> starting clvmd at boot make a difference? > >>>>>>>>>> > >>>>>>>>>> Il giorno 13 marzo 2012 23:29, William Seligman< > [email protected]> > >>>>>>>>>>> > >>>>>>>>>>>> ha scritto: > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 3/13/12 5:50 PM, emmanuel segura wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> So if you using cman why you use lsb::clvmd > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think you are very confused > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> I don't dispute that I may be very confused! > >>>>>>>>>>>> > >>>>>>>>>>>> However, from what I can tell, I still need to run > >>>>>>>>>>>> clvmd even if I'm running cman (I'm not using > >>>>>>>>>>>> rgmanager). If I just run cman, gfs2 and any other form > >>>>>>>>>>>> of mount fails. If I run cman, then clvmd, then gfs2, > >>>>>>>>>>>> everything behaves normally. > >>>>>>>>>>>> > >>>>>>>>>>>> Going by these instructions: > >>>>>>>>>>>> > >>>>>>>>>>>> <https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial> > >>>>>>>>>>>> > >>>>>>>>>>>> the resources he puts under "cluster control" > >>>>>>>>>>>> (rgmanager) I have to put under pacemaker control. > >>>>>>>>>>>> Those include drbd, clvmd, and gfs2. > >>>>>>>>>>>> > >>>>>>>>>>>> The difference between what I've got, and what's in > >>>>>>>>>>>> "Clusters From Scratch", is in CFS they assign one DRBD > >>>>>>>>>>>> volume to a single filesystem. I create an LVM physical > >>>>>>>>>>>> volume on my DRBD resource, as in the above tutorial, > >>>>>>>>>>>> and so I have to start clvmd or the logical volumes in > >>>>>>>>>>>> the DRBD partition won't be recognized.>> Is there some > >>>>>>>>>>>> way to get logical volumes recognized automatically by > >>>>>>>>>>>> cman without rgmanager that I've missed? > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Il giorno 13 marzo 2012 22:42, William Seligman< > >>>>>>>>>>>>> > >>>>>>>>>>>> [email protected] > >>>>>>>>>>>> > >>>>>>>>>>>>> ha scritto: > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 3/13/12 12:29 PM, William Seligman wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm not sure if this is a "Linux-HA" question; > >>>>>>>>>>>>>>> please direct me to the appropriate list if it's > >>>>>>>>>>>>>>> not. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm setting up a two-node cman+pacemaker+gfs2 > >>>>>>>>>>>>>>> cluster as described in "Clusters From Scratch." > >>>>>>>>>>>>>>> Fencing is through forcibly rebooting a node by > >>>>>>>>>>>>>>> cutting and restoring its power via UPS. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> My fencing/failover tests have revealed a > >>>>>>>>>>>>>>> problem. If I gracefully turn off one node ("crm > >>>>>>>>>>>>>>> node standby"; "service pacemaker stop"; > >>>>>>>>>>>>>>> "shutdown -r now") all the resources transfer to > >>>>>>>>>>>>>>> the other node with no problems. If I cut power > >>>>>>>>>>>>>>> to one node (as would happen if it were fenced), > >>>>>>>>>>>>>>> the lsb::clvmd resource on the remaining node > >>>>>>>>>>>>>>> eventually fails. Since all the other resources > >>>>>>>>>>>>>>> depend on clvmd, all the resources on the > >>>>>>>>>>>>>>> remaining node stop and the cluster is left with > >>>>>>>>>>>>>>> nothing running. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I've traced why the lsb::clvmd fails: The > >>>>>>>>>>>>>>> monitor/status command includes "vgdisplay", > >>>>>>>>>>>>>>> which hangs indefinitely. Therefore the monitor > >>>>>>>>>>>>>>> will always time-out. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So this isn't a problem with pacemaker, but with > >>>>>>>>>>>>>>> clvmd/dlm: If a node is cut off, the cluster > >>>>>>>>>>>>>>> isn't handling it properly. Has anyone on this > >>>>>>>>>>>>>>> list seen this before? Any ideas? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Details: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> versions: > >>>>>>>>>>>>>>> Redhat Linux 6.2 (kernel 2.6.32) > >>>>>>>>>>>>>>> cman-3.0.12.1 > >>>>>>>>>>>>>>> corosync-1.4.1 > >>>>>>>>>>>>>>> pacemaker-1.1.6 > >>>>>>>>>>>>>>> lvm2-2.02.87 > >>>>>>>>>>>>>>> lvm2-cluster-2.02.87 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This may be a Linux-HA question after all! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I ran a few more tests. Here's the output from a > >>>>>>>>>>>>>> typical test of > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**" > >>>>>>>>>>>>>> /var/log/messages > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <http://pastebin.com/uqC6bc1b> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> It looks like what's happening is that the fence > >>>>>>>>>>>>>> agent (one I wrote) is not returning the proper > >>>>>>>>>>>>>> error code when a node crashes. According to this > >>>>>>>>>>>>>> page, if a fencing agent fails GFS2 will freeze to > >>>>>>>>>>>>>> protect the data: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> < > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html > > > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> As a test, I tried to fence my test node via > >>>>>>>>>>>>>> standard means: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> stonith_admin -F \ > >>>>>>>>>>>>>> orestes-corosync.nevis.columbia.edu > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> These were the log messages, which show that > >>>>>>>>>>>>>> stonith_admin did its job and CMAN was notified of > >>>>>>>>>>>>>> the fencing:<http://pastebin.com/jaH820Bv>. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Unfortunately, I still got the gfs2 freeze, so this > >>>>>>>>>>>>>> is not the complete story. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> First things first. I vaguely recall a web page > >>>>>>>>>>>>>> that went over the STONITH return codes, but I > >>>>>>>>>>>>>> can't locate it again. Is there any reference to > >>>>>>>>>>>>>> the return codes expected from a fencing agent, > >>>>>>>>>>>>>> perhaps as function of the state of the fencing > >>>>>>>>>>>>>> device? > > -- > Bill Seligman | Phone: (914) 591-2823 > Nevis Labs, Columbia Univ | mailto://[email protected] > PO Box 137 | > Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- esta es mi vida e me la vivo hasta que dios quiera _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
