Re: [Linux-HA] pacemaker/heartbeat LVM
Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \ resource-stickiness=100 -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] pacemaker/heartbeat LVM
by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \ resource-stickiness=100 -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- import this -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] pacemaker/heartbeat LVM
please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \ resource-stickiness=100 -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- import this -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera
Re: [Linux-HA] pacemaker/heartbeat LVM
hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \ resource-stickiness=100 -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org
Re: [Linux-HA] pacemaker/heartbeat LVM
Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \ resource-stickiness=100 -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera
Re: [Linux-HA] pacemaker/heartbeat LVM
ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \
Re: [Linux-HA] pacemaker/heartbeat LVM
Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new. 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com: ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta
Re: [Linux-HA] pacemaker/heartbeat LVM
Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error: Permission deniedPermission deniedI/O warning : failed to load external entity /var/lib/pacemaker/cib/cib.xml Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot link /var/lib/pacemaker/cib/cib.xml to /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1) 2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com: Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new. 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com: ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd
Re: [Linux-HA] pacemaker/heartbeat LVM
hmm.. but as far as I can see, looks like that messages can still be ignored. My original problem is that.. LVM resource agent doesn't try to activate the VG on the passive node if the active node goes power off. On Mon, Dec 29, 2014 at 5:33 PM, emmanuel segura emi2f...@gmail.com wrote: Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new. 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com: ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist from the LVM agent, it checked if the volume is already available.. and will raise the above error if not. But, I don't see that it tries to activate it before raising the VG. Perhaps, it assumes that the VG is already activated... so, I'm not sure who should be activating it (should it be LVM?). if [ $rc -ne 0 ]; then ocf_log $loglevel LVM Volume $1 is not available (stopped) rc=$OCF_NOT_RUNNING else case $(get_vg_mode) in 1) # exclusive with tagging. # If vg is running, make sure the correct tag is present. Otherwise we # can not guarantee exclusive activation. if ! check_tags; then ocf_exit_reason WARNING: $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\ On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote: logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \
Re: [Linux-HA] pacemaker/heartbeat LVM
perhaps, we need to focus on this message. as mentioned.. the cluster is working fine under normal circumstances. my only concern is that, LVM resource agent doesn't try to re-activate the VG on the passive node when the active node goes down ungracefully (powered off). Hence, it could not mount the filesystems.. etc. Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation sbd_monitor_0: not running (node= s1, call=5, rc=7, cib-update=35, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 13: monitor dlm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 5: monitor dlm:1_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation dlm_monitor_0: not running (node= s1, call=10, rc=7, cib-update=36, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 14: monitor clvm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 6: monitor clvm:1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation clvm_monitor_0: not running (node =s1, call=15, rc=7, cib-update=37, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 15: monitor cluIP_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 7: monitor cluIP_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation cluIP_monitor_0: not running (nod e=s1, call=19, rc=7, cib-update=38, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 8: monitor vg1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=39, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 17: monitor fs1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation fs1_monitor_0: not running (node= s1, call=27, rc=7, cib-update=40, confirmed=true) On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com wrote: Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error: Permission deniedPermission deniedI/O warning : failed to load external entity /var/lib/pacemaker/cib/cib.xml Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot link /var/lib/pacemaker/cib/cib.xml to /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1) 2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com: Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new. 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com: ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster is working fine. I only encounter this error if I try to poweroff/shutoff the active node. On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote: Hi. Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 13:47:16 s1 crmd[1515]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=40, confirmed=true) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 13:47:16 s1 crmd[1515]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 13:47:16 s1
Re: [Linux-HA] pacemaker/heartbeat LVM
Hi, ah yeah.. tried to poweroff the active node.. and tried pvscan on the passive.. and yes.. it didn't worked --- it doesn't return to the shell. So, the problem is on DLM? On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote: Power off the active node and after one seconde try to use one lvm command, for example pvscan, if this command doesn't response is because dlm relay on cluster fencing, if the cluster fencing doesn't work the dlm state in blocked state. 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com: perhaps, we need to focus on this message. as mentioned.. the cluster is working fine under normal circumstances. my only concern is that, LVM resource agent doesn't try to re-activate the VG on the passive node when the active node goes down ungracefully (powered off). Hence, it could not mount the filesystems.. etc. Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation sbd_monitor_0: not running (node= s1, call=5, rc=7, cib-update=35, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 13: monitor dlm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 5: monitor dlm:1_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation dlm_monitor_0: not running (node= s1, call=10, rc=7, cib-update=36, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 14: monitor clvm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 6: monitor clvm:1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation clvm_monitor_0: not running (node =s1, call=15, rc=7, cib-update=37, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 15: monitor cluIP_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 7: monitor cluIP_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation cluIP_monitor_0: not running (nod e=s1, call=19, rc=7, cib-update=38, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 8: monitor vg1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=39, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 17: monitor fs1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation fs1_monitor_0: not running (node= s1, call=27, rc=7, cib-update=40, confirmed=true) On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com wrote: Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error: Permission deniedPermission deniedI/O warning : failed to load external entity /var/lib/pacemaker/cib/cib.xml Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot link /var/lib/pacemaker/cib/cib.xml to /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1) 2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com: Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new. 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com: ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com wrote: please use pastebin and show your whole logs 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com: by the way.. just to note that.. for a normal testing (manual failover, rebooting the active node)... the cluster
Re: [Linux-HA] pacemaker/heartbeat LVM
Dlm isn't the problem, but i think is your fencing, when you powered off the active node, the dead remain in unclean state? can you show me your sbd timeouts? sbd -d /dev/path_of_your_device dump Thanks 2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, ah yeah.. tried to poweroff the active node.. and tried pvscan on the passive.. and yes.. it didn't worked --- it doesn't return to the shell. So, the problem is on DLM? On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote: Power off the active node and after one seconde try to use one lvm command, for example pvscan, if this command doesn't response is because dlm relay on cluster fencing, if the cluster fencing doesn't work the dlm state in blocked state. 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com: perhaps, we need to focus on this message. as mentioned.. the cluster is working fine under normal circumstances. my only concern is that, LVM resource agent doesn't try to re-activate the VG on the passive node when the active node goes down ungracefully (powered off). Hence, it could not mount the filesystems.. etc. Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation sbd_monitor_0: not running (node= s1, call=5, rc=7, cib-update=35, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 13: monitor dlm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 5: monitor dlm:1_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation dlm_monitor_0: not running (node= s1, call=10, rc=7, cib-update=36, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 14: monitor clvm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 6: monitor clvm:1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation clvm_monitor_0: not running (node =s1, call=15, rc=7, cib-update=37, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 15: monitor cluIP_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 7: monitor cluIP_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation cluIP_monitor_0: not running (nod e=s1, call=19, rc=7, cib-update=38, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 8: monitor vg1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=39, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 17: monitor fs1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation fs1_monitor_0: not running (node= s1, call=27, rc=7, cib-update=40, confirmed=true) On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com wrote: Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error: Permission deniedPermission deniedI/O warning : failed to load external entity /var/lib/pacemaker/cib/cib.xml Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot link /var/lib/pacemaker/cib/cib.xml to /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1) 2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com: Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new. 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com: ok, sorry for that.. please use this instead. http://pastebin.centos.org/14771/ thanks. On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote: Sorry, But your paste is empty. 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com: hi, uploaded it here. http://susepaste.org/45413433 thanks. On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote: Ok, i attached the log file of one of the nodes. On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura
Re: [Linux-HA] pacemaker/heartbeat LVM
https://bugzilla.redhat.com/show_bug.cgi?id=1127289#c4 https://bugzilla.redhat.com/show_bug.cgi?id=1127289 2014-12-29 11:57 GMT+01:00 Marlon Guao marlon.g...@gmail.com: here it is.. ==Dumping header on disk /dev/mapper/sbd Header version : 2.1 UUID : 36074673-f48e-4da2-b4ee-385e83e6abcc Number of slots: 255 Sector size: 512 Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 On Mon, Dec 29, 2014 at 6:42 PM, emmanuel segura emi2f...@gmail.com wrote: Dlm isn't the problem, but i think is your fencing, when you powered off the active node, the dead remain in unclean state? can you show me your sbd timeouts? sbd -d /dev/path_of_your_device dump Thanks 2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, ah yeah.. tried to poweroff the active node.. and tried pvscan on the passive.. and yes.. it didn't worked --- it doesn't return to the shell. So, the problem is on DLM? On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote: Power off the active node and after one seconde try to use one lvm command, for example pvscan, if this command doesn't response is because dlm relay on cluster fencing, if the cluster fencing doesn't work the dlm state in blocked state. 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com: perhaps, we need to focus on this message. as mentioned.. the cluster is working fine under normal circumstances. my only concern is that, LVM resource agent doesn't try to re-activate the VG on the passive node when the active node goes down ungracefully (powered off). Hence, it could not mount the filesystems.. etc. Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation sbd_monitor_0: not running (node= s1, call=5, rc=7, cib-update=35, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 13: monitor dlm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 5: monitor dlm:1_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation dlm_monitor_0: not running (node= s1, call=10, rc=7, cib-update=36, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 14: monitor clvm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 6: monitor clvm:1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation clvm_monitor_0: not running (node =s1, call=15, rc=7, cib-update=37, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 15: monitor cluIP_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 7: monitor cluIP_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation cluIP_monitor_0: not running (nod e=s1, call=19, rc=7, cib-update=38, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 8: monitor vg1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=39, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 17: monitor fs1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation fs1_monitor_0: not running (node= s1, call=27, rc=7, cib-update=40, confirmed=true) On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com wrote: Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error: Permission deniedPermission deniedI/O warning : failed to load external entity /var/lib/pacemaker/cib/cib.xml Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot link /var/lib/pacemaker/cib/cib.xml to /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1) 2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com: Hi, You have a problem with the cluster stonithd:error: crm_abort: crm_glib_handler: Forked child 6186 to record non-fatal assert at logging.c:73 Try to post your cluster version(packages), maybe someone can tell you if this is a known bug or new.
Re: [Linux-HA] pacemaker/heartbeat LVM
looks like it's similar to this as well. http://comments.gmane.org/gmane.linux.highavailability.pacemaker/22398 but, could it be because, clvm is not activating the vg on the passive node, because it's waiting for quorum? seeing this on the log as well. Dec 29 21:18:09 s2 dlm_controld[1776]: 8544 fence work wait for quorum Dec 29 21:18:12 s2 dlm_controld[1776]: 8547 clvmd wait for quorum On Mon, Dec 29, 2014 at 9:24 PM, Marlon Guao marlon.g...@gmail.com wrote: interesting, i'm using the newer pacemaker version.. pacemaker-1.1.12.git20140904.266d5c2-1.5.x86_64 On Mon, Dec 29, 2014 at 8:11 PM, emmanuel segura emi2f...@gmail.com wrote: https://bugzilla.redhat.com/show_bug.cgi?id=1127289#c4 https://bugzilla.redhat.com/show_bug.cgi?id=1127289 2014-12-29 11:57 GMT+01:00 Marlon Guao marlon.g...@gmail.com: here it is.. ==Dumping header on disk /dev/mapper/sbd Header version : 2.1 UUID : 36074673-f48e-4da2-b4ee-385e83e6abcc Number of slots: 255 Sector size: 512 Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 On Mon, Dec 29, 2014 at 6:42 PM, emmanuel segura emi2f...@gmail.com wrote: Dlm isn't the problem, but i think is your fencing, when you powered off the active node, the dead remain in unclean state? can you show me your sbd timeouts? sbd -d /dev/path_of_your_device dump Thanks 2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, ah yeah.. tried to poweroff the active node.. and tried pvscan on the passive.. and yes.. it didn't worked --- it doesn't return to the shell. So, the problem is on DLM? On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote: Power off the active node and after one seconde try to use one lvm command, for example pvscan, if this command doesn't response is because dlm relay on cluster fencing, if the cluster fencing doesn't work the dlm state in blocked state. 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com: perhaps, we need to focus on this message. as mentioned.. the cluster is working fine under normal circumstances. my only concern is that, LVM resource agent doesn't try to re-activate the VG on the passive node when the active node goes down ungracefully (powered off). Hence, it could not mount the filesystems.. etc. Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation sbd_monitor_0: not running (node= s1, call=5, rc=7, cib-update=35, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 13: monitor dlm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 5: monitor dlm:1_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation dlm_monitor_0: not running (node= s1, call=10, rc=7, cib-update=36, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 14: monitor clvm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 6: monitor clvm:1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation clvm_monitor_0: not running (node =s1, call=15, rc=7, cib-update=37, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 15: monitor cluIP_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 7: monitor cluIP_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation cluIP_monitor_0: not running (nod e=s1, call=19, rc=7, cib-update=38, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 8: monitor vg1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=39, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 17: monitor fs1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 9: monitor fs1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device [/dev/mapper/cluvg1-clulv1]. Ex pected /dev/??? to exist Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation fs1_monitor_0: not running (node= s1, call=27, rc=7, cib-update=40, confirmed=true) On Mon, Dec
Re: [Linux-HA] pacemaker/heartbeat LVM
you have quorum-policy=ignore, in the thread you posted: Nov 24 09:52:10 nebula3 dlm_controld[6263]: 566 datastores wait for fencing Nov 24 09:52:10 nebula3 dlm_controld[6263]: 566 clvmd wait for fencing Nov 24 09:55:10 nebula3 dlm_controld[6263]: 747 fence status 1084811078 receive -125 from 1084811079 walltime 1416819310 local 747 {lvm}-{clvmd}-{dlm}-{fencing} = if fencing isn't working :) your cluster will be broken. 2014-12-29 15:46 GMT+01:00 Marlon Guao marlon.g...@gmail.com: looks like it's similar to this as well. http://comments.gmane.org/gmane.linux.highavailability.pacemaker/22398 but, could it be because, clvm is not activating the vg on the passive node, because it's waiting for quorum? seeing this on the log as well. Dec 29 21:18:09 s2 dlm_controld[1776]: 8544 fence work wait for quorum Dec 29 21:18:12 s2 dlm_controld[1776]: 8547 clvmd wait for quorum On Mon, Dec 29, 2014 at 9:24 PM, Marlon Guao marlon.g...@gmail.com wrote: interesting, i'm using the newer pacemaker version.. pacemaker-1.1.12.git20140904.266d5c2-1.5.x86_64 On Mon, Dec 29, 2014 at 8:11 PM, emmanuel segura emi2f...@gmail.com wrote: https://bugzilla.redhat.com/show_bug.cgi?id=1127289#c4 https://bugzilla.redhat.com/show_bug.cgi?id=1127289 2014-12-29 11:57 GMT+01:00 Marlon Guao marlon.g...@gmail.com: here it is.. ==Dumping header on disk /dev/mapper/sbd Header version : 2.1 UUID : 36074673-f48e-4da2-b4ee-385e83e6abcc Number of slots: 255 Sector size: 512 Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 On Mon, Dec 29, 2014 at 6:42 PM, emmanuel segura emi2f...@gmail.com wrote: Dlm isn't the problem, but i think is your fencing, when you powered off the active node, the dead remain in unclean state? can you show me your sbd timeouts? sbd -d /dev/path_of_your_device dump Thanks 2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, ah yeah.. tried to poweroff the active node.. and tried pvscan on the passive.. and yes.. it didn't worked --- it doesn't return to the shell. So, the problem is on DLM? On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote: Power off the active node and after one seconde try to use one lvm command, for example pvscan, if this command doesn't response is because dlm relay on cluster fencing, if the cluster fencing doesn't work the dlm state in blocked state. 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com: perhaps, we need to focus on this message. as mentioned.. the cluster is working fine under normal circumstances. my only concern is that, LVM resource agent doesn't try to re-activate the VG on the passive node when the active node goes down ungracefully (powered off). Hence, it could not mount the filesystems.. etc. Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation sbd_monitor_0: not running (node= s1, call=5, rc=7, cib-update=35, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 13: monitor dlm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 5: monitor dlm:1_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation dlm_monitor_0: not running (node= s1, call=10, rc=7, cib-update=36, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 14: monitor clvm:0_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 6: monitor clvm:1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation clvm_monitor_0: not running (node =s1, call=15, rc=7, cib-update=37, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 15: monitor cluIP_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 7: monitor cluIP_monitor_0 o n s1 (local) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation cluIP_monitor_0: not running (nod e=s1, call=19, rc=7, cib-update=38, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 16: monitor vg1_monitor_0 on s2 Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action 8: monitor vg1_monitor_0 on s1 (local) Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not available (stopped) Dec 29 17:12:26 s1 crmd[1495]: notice: process_lrm_event: Operation vg1_monitor_0: not running (node= s1, call=23, rc=7, cib-update=39, confirmed=true) Dec 29 17:12:26 s1 crmd[1495]: notice: te_rsc_command: Initiating action
Re: [Linux-HA] pacemaker/heartbeat LVM
logs? 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com: Hi, just want to ask regarding the LVM resource agent on pacemaker/corosync. I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster works as expected, like doing a manual failover (via crm resource move), and automatic failover (by rebooting the active node for instance). But, if i try to just shutoff the active node (it's a VM, so I can do a poweroff). The resources won't be able to failover to the passive node. when I did an investigation, it's due to an LVM resource not starting (specifically, the VG). I found out that the LVM resource won't try to activate the volume group in the passive node. Is this an expected behaviour? what I really expect is that, in the event that the active node be shutoff (by a power outage for instance), all resources should be failover automatically to the passive. LVM should re-activate the VG. here's my config. node 1: s1 node 2: s2 primitive cluIP IPaddr2 \ params ip=192.168.13.200 cidr_netmask=32 \ op monitor interval=30s primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ op monitor timeout=90 interval=30 primitive dlm ocf:pacemaker:controld \ op monitor interval=60s timeout=90s on-fail=ignore \ op start interval=0 timeout=90 primitive fs1 Filesystem \ params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs primitive mariadb mysql \ params config=/etc/my.cnf primitive sbd stonith:external/sbd \ op monitor interval=15s timeout=60s primitive vg1 LVM \ params volgrpname=cluvg1 exclusive=yes \ op start timeout=10s interval=0 \ op stop interval=0 timeout=10 \ op monitor interval=10 timeout=30 on-fail=restart depth=0 group base-group dlm clvm group rgroup cluIP vg1 fs1 mariadb \ meta target-role=Started clone base-clone base-group \ meta interleave=true target-role=Started property cib-bootstrap-options: \ dc-version=1.1.12-1.1.12.git20140904.266d5c2 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ last-lrm-refresh=1419514875 \ cluster-name=xxx \ stonith-enabled=true rsc_defaults rsc-options: \ resource-stickiness=100 -- import this ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems