Dlm isn't the problem, but i think is your fencing, when you powered off the active node, the dead remain in unclean state? can you show me your sbd timeouts? sbd -d /dev/path_of_your_device dump
Thanks 2014-12-29 11:02 GMT+01:00 Marlon Guao <marlon.g...@gmail.com>: > Hi, > > ah yeah.. tried to poweroff the active node.. and tried pvscan on the > passive.. and yes.. it didn't worked --- it doesn't return to the shell. > So, the problem is on DLM? > > On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura <emi2f...@gmail.com> wrote: > >> Power off the active node and after one seconde try to use one lvm >> command, for example pvscan, if this command doesn't response is >> because dlm relay on cluster fencing, if the cluster fencing doesn't >> work the dlm state in blocked state. >> >> 2014-12-29 10:43 GMT+01:00 Marlon Guao <marlon.g...@gmail.com>: >> > perhaps, we need to focus on this message. as mentioned.. the cluster is >> > working fine under normal circumstances. my only concern is that, LVM >> > resource agent doesn't try to re-activate the VG on the passive node when >> > the active node goes down ungracefully (powered off). Hence, it could not >> > mount the filesystems.. etc. >> > >> > >> > Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation >> > sbd_monitor_0: not running (node= >> > s1, call=5, rc=7, cib-update=35, confirmed=true) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 13: monitor dlm:0_monitor_0 >> > on s2 >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 5: monitor dlm:1_monitor_0 o >> > n s1 (local) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation >> > dlm_monitor_0: not running (node= >> > s1, call=10, rc=7, cib-update=36, confirmed=true) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 14: monitor clvm:0_monitor_0 >> > on s2 >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 6: monitor clvm:1_monitor_0 >> > on s1 (local) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation >> > clvm_monitor_0: not running (node >> > =s1, call=15, rc=7, cib-update=37, confirmed=true) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 15: monitor cluIP_monitor_0 >> > on s2 >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 7: monitor cluIP_monitor_0 o >> > n s1 (local) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation >> > cluIP_monitor_0: not running (nod >> > e=s1, call=19, rc=7, cib-update=38, confirmed=true) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 16: monitor vg1_monitor_0 on >> > s2 >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 8: monitor vg1_monitor_0 on >> > s1 (local) >> > Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not >> > available (stopped) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation >> > vg1_monitor_0: not running (node= >> > s1, call=23, rc=7, cib-update=39, confirmed=true) >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 17: monitor fs1_monitor_0 on >> > s2 >> > Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating >> action >> > 9: monitor fs1_monitor_0 on >> > s1 (local) >> > Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device >> > [/dev/mapper/cluvg1-clulv1]. Ex >> > pected /dev/??? to exist >> > Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation >> > fs1_monitor_0: not running (node= >> > s1, call=27, rc=7, cib-update=40, confirmed=true) >> > >> > On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura <emi2f...@gmail.com> >> wrote: >> > >> >> Dec 27 15:38:00 s1 cib[1514]: error: crm_xml_err: XML Error: >> >> Permission deniedPermission deniedI/O warning : failed to load >> >> external entity "/var/lib/pacemaker/cib/cib.xml" >> >> Dec 27 15:38:00 s1 cib[1514]: error: write_cib_contents: Cannot >> >> link /var/lib/pacemaker/cib/cib.xml to >> >> /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1) >> >> >> >> 2014-12-29 10:33 GMT+01:00 emmanuel segura <emi2f...@gmail.com>: >> >> > Hi, >> >> > >> >> > You have a problem with the cluster stonithd:"error: crm_abort: >> >> > crm_glib_handler: Forked child 6186 to record non-fatal assert at >> >> > logging.c:73 " >> >> > >> >> > Try to post your cluster version(packages), maybe someone can tell you >> >> > if this is a known bug or new. >> >> > >> >> > >> >> > >> >> > 2014-12-29 10:29 GMT+01:00 Marlon Guao <marlon.g...@gmail.com>: >> >> >> ok, sorry for that.. please use this instead. >> >> >> >> >> >> http://pastebin.centos.org/14771/ >> >> >> >> >> >> thanks. >> >> >> >> >> >> On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura <emi2f...@gmail.com >> > >> >> wrote: >> >> >> >> >> >>> Sorry, >> >> >>> >> >> >>> But your paste is empty. >> >> >>> >> >> >>> 2014-12-29 10:19 GMT+01:00 Marlon Guao <marlon.g...@gmail.com>: >> >> >>> > hi, >> >> >>> > >> >> >>> > uploaded it here. >> >> >>> > >> >> >>> > http://susepaste.org/45413433 >> >> >>> > >> >> >>> > thanks. >> >> >>> > >> >> >>> > On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao < >> marlon.g...@gmail.com> >> >> >>> wrote: >> >> >>> > >> >> >>> >> Ok, i attached the log file of one of the nodes. >> >> >>> >> >> >> >>> >> On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura < >> >> emi2f...@gmail.com> >> >> >>> >> wrote: >> >> >>> >> >> >> >>> >>> please use pastebin and show your whole logs >> >> >>> >>> >> >> >>> >>> 2014-12-29 9:06 GMT+01:00 Marlon Guao <marlon.g...@gmail.com>: >> >> >>> >>> > by the way.. just to note that.. for a normal testing (manual >> >> >>> failover, >> >> >>> >>> > rebooting the active node)... the cluster is working fine. I >> only >> >> >>> >>> encounter >> >> >>> >>> > this error if I try to poweroff/shutoff the active node. >> >> >>> >>> > >> >> >>> >>> > On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao < >> >> marlon.g...@gmail.com> >> >> >>> >>> wrote: >> >> >>> >>> > >> >> >>> >>> >> Hi. >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 >> >> is not >> >> >>> >>> >> available (stopped) >> >> >>> >>> >> Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: >> >> >>> Operation >> >> >>> >>> >> vg1_monitor_0: not running (node= >> >> >>> >>> >> s1, call=23, rc=7, cib-update=40, confirmed=true) >> >> >>> >>> >> Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: >> >> Initiating >> >> >>> >>> action >> >> >>> >>> >> 9: monitor fs1_monitor_0 on >> >> >>> >>> >> s1 (local) >> >> >>> >>> >> Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: >> >> Initiating >> >> >>> >>> action >> >> >>> >>> >> 16: monitor vg1_monitor_0 on >> >> >>> >>> >> s2 >> >> >>> >>> >> Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't >> find >> >> >>> device >> >> >>> >>> >> [/dev/mapper/cluvg1-clulv1]. Ex >> >> >>> >>> >> pected /dev/??? to exist >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> from the LVM agent, it checked if the volume is already >> >> available.. >> >> >>> and >> >> >>> >>> >> will raise the above error if not. But, I don't see that it >> >> tries to >> >> >>> >>> >> activate it before raising the VG. Perhaps, it assumes that >> the >> >> VG >> >> >>> is >> >> >>> >>> >> already activated... so, I'm not sure who should be >> activating >> >> it >> >> >>> >>> (should >> >> >>> >>> >> it be LVM?). >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> if [ $rc -ne 0 ]; then >> >> >>> >>> >> ocf_log $loglevel "LVM Volume $1 is not >> >> available >> >> >>> >>> >> (stopped)" >> >> >>> >>> >> rc=$OCF_NOT_RUNNING >> >> >>> >>> >> else >> >> >>> >>> >> case $(get_vg_mode) in >> >> >>> >>> >> 1) # exclusive with tagging. >> >> >>> >>> >> # If vg is running, make sure the >> >> correct >> >> >>> tag >> >> >>> >>> is >> >> >>> >>> >> present. Otherwise we >> >> >>> >>> >> # can not guarantee exclusive >> >> activation. >> >> >>> >>> >> if ! check_tags; then >> >> >>> >>> >> ocf_exit_reason "WARNING: >> >> >>> >>> >> $OCF_RESKEY_volgrpname is active without the cluster tag, >> >> >>> \"$OUR_TAG\"" >> >> >>> >>> >> >> >> >>> >>> >> On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura < >> >> >>> emi2f...@gmail.com> >> >> >>> >>> >> wrote: >> >> >>> >>> >> >> >> >>> >>> >>> logs? >> >> >>> >>> >>> >> >> >>> >>> >>> 2014-12-29 6:54 GMT+01:00 Marlon Guao < >> marlon.g...@gmail.com>: >> >> >>> >>> >>> > Hi, >> >> >>> >>> >>> > >> >> >>> >>> >>> > just want to ask regarding the LVM resource agent on >> >> >>> >>> pacemaker/corosync. >> >> >>> >>> >>> > >> >> >>> >>> >>> > I setup 2 nodes cluster (opensuse13.2 -- my config below). >> >> The >> >> >>> >>> cluster >> >> >>> >>> >>> > works as expected, like doing a manual failover (via crm >> >> resource >> >> >>> >>> move), >> >> >>> >>> >>> > and automatic failover (by rebooting the active node for >> >> >>> instance). >> >> >>> >>> >>> But, if >> >> >>> >>> >>> > i try to just "shutoff" the active node (it's a VM, so I >> can >> >> do a >> >> >>> >>> >>> > poweroff). The resources won't be able to failover to the >> >> passive >> >> >>> >>> node. >> >> >>> >>> >>> > when I did an investigation, it's due to an LVM resource >> not >> >> >>> >>> starting >> >> >>> >>> >>> > (specifically, the VG). I found out that the LVM resource >> >> won't >> >> >>> try >> >> >>> >>> to >> >> >>> >>> >>> > activate the volume group in the passive node. Is this an >> >> >>> expected >> >> >>> >>> >>> > behaviour? >> >> >>> >>> >>> > >> >> >>> >>> >>> > what I really expect is that, in the event that the active >> >> node >> >> >>> be >> >> >>> >>> >>> shutoff >> >> >>> >>> >>> > (by a power outage for instance), all resources should be >> >> >>> failover >> >> >>> >>> >>> > automatically to the passive. LVM should re-activate the >> VG. >> >> >>> >>> >>> > >> >> >>> >>> >>> > >> >> >>> >>> >>> > here's my config. >> >> >>> >>> >>> > >> >> >>> >>> >>> > node 1: s1 >> >> >>> >>> >>> > node 2: s2 >> >> >>> >>> >>> > primitive cluIP IPaddr2 \ >> >> >>> >>> >>> > params ip=192.168.13.200 cidr_netmask=32 \ >> >> >>> >>> >>> > op monitor interval=30s >> >> >>> >>> >>> > primitive clvm ocf:lvm2:clvmd \ >> >> >>> >>> >>> > params daemon_timeout=30 \ >> >> >>> >>> >>> > op monitor timeout=90 interval=30 >> >> >>> >>> >>> > primitive dlm ocf:pacemaker:controld \ >> >> >>> >>> >>> > op monitor interval=60s timeout=90s on-fail=ignore \ >> >> >>> >>> >>> > op start interval=0 timeout=90 >> >> >>> >>> >>> > primitive fs1 Filesystem \ >> >> >>> >>> >>> > params device="/dev/mapper/cluvg1-clulv1" >> directory="/data" >> >> >>> >>> fstype=btrfs >> >> >>> >>> >>> > primitive mariadb mysql \ >> >> >>> >>> >>> > params config="/etc/my.cnf" >> >> >>> >>> >>> > primitive sbd stonith:external/sbd \ >> >> >>> >>> >>> > op monitor interval=15s timeout=60s >> >> >>> >>> >>> > primitive vg1 LVM \ >> >> >>> >>> >>> > params volgrpname=cluvg1 exclusive=yes \ >> >> >>> >>> >>> > op start timeout=10s interval=0 \ >> >> >>> >>> >>> > op stop interval=0 timeout=10 \ >> >> >>> >>> >>> > op monitor interval=10 timeout=30 on-fail=restart depth=0 >> >> >>> >>> >>> > group base-group dlm clvm >> >> >>> >>> >>> > group rgroup cluIP vg1 fs1 mariadb \ >> >> >>> >>> >>> > meta target-role=Started >> >> >>> >>> >>> > clone base-clone base-group \ >> >> >>> >>> >>> > meta interleave=true target-role=Started >> >> >>> >>> >>> > property cib-bootstrap-options: \ >> >> >>> >>> >>> > dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ >> >> >>> >>> >>> > cluster-infrastructure=corosync \ >> >> >>> >>> >>> > no-quorum-policy=ignore \ >> >> >>> >>> >>> > last-lrm-refresh=1419514875 \ >> >> >>> >>> >>> > cluster-name=xxx \ >> >> >>> >>> >>> > stonith-enabled=true >> >> >>> >>> >>> > rsc_defaults rsc-options: \ >> >> >>> >>> >>> > resource-stickiness=100 >> >> >>> >>> >>> > >> >> >>> >>> >>> > -- >> >> >>> >>> >>> >>>> import this >> >> >>> >>> >>> > _______________________________________________ >> >> >>> >>> >>> > Linux-HA mailing list >> >> >>> >>> >>> > Linux-HA@lists.linux-ha.org >> >> >>> >>> >>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >>> >>> >>> > See also: http://linux-ha.org/ReportingProblems >> >> >>> >>> >>> >> >> >>> >>> >>> >> >> >>> >>> >>> >> >> >>> >>> >>> -- >> >> >>> >>> >>> esta es mi vida e me la vivo hasta que dios quiera >> >> >>> >>> >>> _______________________________________________ >> >> >>> >>> >>> Linux-HA mailing list >> >> >>> >>> >>> Linux-HA@lists.linux-ha.org >> >> >>> >>> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >>> >>> >>> See also: http://linux-ha.org/ReportingProblems >> >> >>> >>> >>> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> -- >> >> >>> >>> >> >>> import this >> >> >>> >>> >> >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > -- >> >> >>> >>> >>>> import this >> >> >>> >>> > _______________________________________________ >> >> >>> >>> > Linux-HA mailing list >> >> >>> >>> > Linux-HA@lists.linux-ha.org >> >> >>> >>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >>> >>> > See also: http://linux-ha.org/ReportingProblems >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> -- >> >> >>> >>> esta es mi vida e me la vivo hasta que dios quiera >> >> >>> >>> _______________________________________________ >> >> >>> >>> Linux-HA mailing list >> >> >>> >>> Linux-HA@lists.linux-ha.org >> >> >>> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >>> >>> See also: http://linux-ha.org/ReportingProblems >> >> >>> >>> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> -- >> >> >>> >> >>> import this >> >> >>> >> >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > -- >> >> >>> >>>> import this >> >> >>> > _______________________________________________ >> >> >>> > Linux-HA mailing list >> >> >>> > Linux-HA@lists.linux-ha.org >> >> >>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >>> > See also: http://linux-ha.org/ReportingProblems >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> esta es mi vida e me la vivo hasta que dios quiera >> >> >>> _______________________________________________ >> >> >>> Linux-HA mailing list >> >> >>> Linux-HA@lists.linux-ha.org >> >> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >>> See also: http://linux-ha.org/ReportingProblems >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >>>>> import this >> >> >> _______________________________________________ >> >> >> Linux-HA mailing list >> >> >> Linux-HA@lists.linux-ha.org >> >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> >> See also: http://linux-ha.org/ReportingProblems >> >> > >> >> > >> >> > >> >> > -- >> >> > esta es mi vida e me la vivo hasta que dios quiera >> >> >> >> >> >> >> >> -- >> >> esta es mi vida e me la vivo hasta que dios quiera >> >> _______________________________________________ >> >> Linux-HA mailing list >> >> Linux-HA@lists.linux-ha.org >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> >> See also: http://linux-ha.org/ReportingProblems >> >> >> > >> > >> > >> > -- >> >>>> import this >> > _______________________________________________ >> > Linux-HA mailing list >> > Linux-HA@lists.linux-ha.org >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha >> > See also: http://linux-ha.org/ReportingProblems >> >> >> >> -- >> esta es mi vida e me la vivo hasta que dios quiera >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > > > -- >>>> import this > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems