Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
Hi.


Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
available (stopped)
Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event: Operation
vg1_monitor_0: not running (node=
s1, call=23, rc=7, cib-update=40, confirmed=true)
Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating action
9: monitor fs1_monitor_0 on
s1 (local)
Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating action
16: monitor vg1_monitor_0 on
 s2
Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device
[/dev/mapper/cluvg1-clulv1]. Ex
pected /dev/??? to exist


from the LVM agent, it checked if the volume is already available.. and
will raise the above error if not. But, I don't see that it tries to
activate it before raising the VG. Perhaps, it assumes that the VG is
already activated... so, I'm not sure who should be activating it (should
it be LVM?).


 if [ $rc -ne 0 ]; then
ocf_log $loglevel LVM Volume $1 is not available (stopped)
rc=$OCF_NOT_RUNNING
else
case $(get_vg_mode) in
1) # exclusive with tagging.
# If vg is running, make sure the correct tag is
present. Otherwise we
# can not guarantee exclusive activation.
if ! check_tags; then
ocf_exit_reason WARNING:
$OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\

On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com wrote:

 logs?

 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  Hi,
 
  just want to ask regarding the LVM resource agent on pacemaker/corosync.
 
  I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster
  works as expected, like doing a manual failover (via crm resource move),
  and automatic failover (by rebooting the active node for instance). But,
 if
  i try to just shutoff the active node (it's a VM, so I can do a
  poweroff). The resources won't be able to failover to the passive node.
  when I did an investigation, it's due to an LVM resource not starting
  (specifically, the VG). I found out that the LVM resource won't try to
  activate the volume group in the passive node. Is this an expected
  behaviour?
 
  what I really expect is that, in the event that the active node be
 shutoff
  (by a power outage for instance), all resources should be failover
  automatically to the passive. LVM should re-activate the VG.
 
 
  here's my config.
 
  node 1: s1
  node 2: s2
  primitive cluIP IPaddr2 \
  params ip=192.168.13.200 cidr_netmask=32 \
  op monitor interval=30s
  primitive clvm ocf:lvm2:clvmd \
  params daemon_timeout=30 \
  op monitor timeout=90 interval=30
  primitive dlm ocf:pacemaker:controld \
  op monitor interval=60s timeout=90s on-fail=ignore \
  op start interval=0 timeout=90
  primitive fs1 Filesystem \
  params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs
  primitive mariadb mysql \
  params config=/etc/my.cnf
  primitive sbd stonith:external/sbd \
  op monitor interval=15s timeout=60s
  primitive vg1 LVM \
  params volgrpname=cluvg1 exclusive=yes \
  op start timeout=10s interval=0 \
  op stop interval=0 timeout=10 \
  op monitor interval=10 timeout=30 on-fail=restart depth=0
  group base-group dlm clvm
  group rgroup cluIP vg1 fs1 mariadb \
  meta target-role=Started
  clone base-clone base-group \
  meta interleave=true target-role=Started
  property cib-bootstrap-options: \
  dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
  cluster-infrastructure=corosync \
  no-quorum-policy=ignore \
  last-lrm-refresh=1419514875 \
  cluster-name=xxx \
  stonith-enabled=true
  rsc_defaults rsc-options: \
  resource-stickiness=100
 
  --
  import this
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 esta es mi vida e me la vivo hasta que dios quiera
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
 import this
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
by the way.. just to note that.. for a normal testing (manual failover,
rebooting the active node)... the cluster is working fine. I only encounter
this error if I try to poweroff/shutoff the active node.

On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote:

 Hi.


 Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
 available (stopped)
 Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event: Operation
 vg1_monitor_0: not running (node=
 s1, call=23, rc=7, cib-update=40, confirmed=true)
 Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating action
 9: monitor fs1_monitor_0 on
 s1 (local)
 Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating action
 16: monitor vg1_monitor_0 on
  s2
 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device
 [/dev/mapper/cluvg1-clulv1]. Ex
 pected /dev/??? to exist


 from the LVM agent, it checked if the volume is already available.. and
 will raise the above error if not. But, I don't see that it tries to
 activate it before raising the VG. Perhaps, it assumes that the VG is
 already activated... so, I'm not sure who should be activating it (should
 it be LVM?).


  if [ $rc -ne 0 ]; then
 ocf_log $loglevel LVM Volume $1 is not available
 (stopped)
 rc=$OCF_NOT_RUNNING
 else
 case $(get_vg_mode) in
 1) # exclusive with tagging.
 # If vg is running, make sure the correct tag is
 present. Otherwise we
 # can not guarantee exclusive activation.
 if ! check_tags; then
 ocf_exit_reason WARNING:
 $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\

 On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 logs?

 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  Hi,
 
  just want to ask regarding the LVM resource agent on pacemaker/corosync.
 
  I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster
  works as expected, like doing a manual failover (via crm resource move),
  and automatic failover (by rebooting the active node for instance).
 But, if
  i try to just shutoff the active node (it's a VM, so I can do a
  poweroff). The resources won't be able to failover to the passive node.
  when I did an investigation, it's due to an LVM resource not starting
  (specifically, the VG). I found out that the LVM resource won't try to
  activate the volume group in the passive node. Is this an expected
  behaviour?
 
  what I really expect is that, in the event that the active node be
 shutoff
  (by a power outage for instance), all resources should be failover
  automatically to the passive. LVM should re-activate the VG.
 
 
  here's my config.
 
  node 1: s1
  node 2: s2
  primitive cluIP IPaddr2 \
  params ip=192.168.13.200 cidr_netmask=32 \
  op monitor interval=30s
  primitive clvm ocf:lvm2:clvmd \
  params daemon_timeout=30 \
  op monitor timeout=90 interval=30
  primitive dlm ocf:pacemaker:controld \
  op monitor interval=60s timeout=90s on-fail=ignore \
  op start interval=0 timeout=90
  primitive fs1 Filesystem \
  params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs
  primitive mariadb mysql \
  params config=/etc/my.cnf
  primitive sbd stonith:external/sbd \
  op monitor interval=15s timeout=60s
  primitive vg1 LVM \
  params volgrpname=cluvg1 exclusive=yes \
  op start timeout=10s interval=0 \
  op stop interval=0 timeout=10 \
  op monitor interval=10 timeout=30 on-fail=restart depth=0
  group base-group dlm clvm
  group rgroup cluIP vg1 fs1 mariadb \
  meta target-role=Started
  clone base-clone base-group \
  meta interleave=true target-role=Started
  property cib-bootstrap-options: \
  dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
  cluster-infrastructure=corosync \
  no-quorum-policy=ignore \
  last-lrm-refresh=1419514875 \
  cluster-name=xxx \
  stonith-enabled=true
  rsc_defaults rsc-options: \
  resource-stickiness=100
 
  --
  import this
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 esta es mi vida e me la vivo hasta que dios quiera
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
  import this




-- 
 import this
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
please use pastebin and show your whole logs

2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 by the way.. just to note that.. for a normal testing (manual failover,
 rebooting the active node)... the cluster is working fine. I only encounter
 this error if I try to poweroff/shutoff the active node.

 On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com wrote:

 Hi.


 Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
 available (stopped)
 Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event: Operation
 vg1_monitor_0: not running (node=
 s1, call=23, rc=7, cib-update=40, confirmed=true)
 Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating action
 9: monitor fs1_monitor_0 on
 s1 (local)
 Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating action
 16: monitor vg1_monitor_0 on
  s2
 Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device
 [/dev/mapper/cluvg1-clulv1]. Ex
 pected /dev/??? to exist


 from the LVM agent, it checked if the volume is already available.. and
 will raise the above error if not. But, I don't see that it tries to
 activate it before raising the VG. Perhaps, it assumes that the VG is
 already activated... so, I'm not sure who should be activating it (should
 it be LVM?).


  if [ $rc -ne 0 ]; then
 ocf_log $loglevel LVM Volume $1 is not available
 (stopped)
 rc=$OCF_NOT_RUNNING
 else
 case $(get_vg_mode) in
 1) # exclusive with tagging.
 # If vg is running, make sure the correct tag is
 present. Otherwise we
 # can not guarantee exclusive activation.
 if ! check_tags; then
 ocf_exit_reason WARNING:
 $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\

 On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 logs?

 2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  Hi,
 
  just want to ask regarding the LVM resource agent on pacemaker/corosync.
 
  I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster
  works as expected, like doing a manual failover (via crm resource move),
  and automatic failover (by rebooting the active node for instance).
 But, if
  i try to just shutoff the active node (it's a VM, so I can do a
  poweroff). The resources won't be able to failover to the passive node.
  when I did an investigation, it's due to an LVM resource not starting
  (specifically, the VG). I found out that the LVM resource won't try to
  activate the volume group in the passive node. Is this an expected
  behaviour?
 
  what I really expect is that, in the event that the active node be
 shutoff
  (by a power outage for instance), all resources should be failover
  automatically to the passive. LVM should re-activate the VG.
 
 
  here's my config.
 
  node 1: s1
  node 2: s2
  primitive cluIP IPaddr2 \
  params ip=192.168.13.200 cidr_netmask=32 \
  op monitor interval=30s
  primitive clvm ocf:lvm2:clvmd \
  params daemon_timeout=30 \
  op monitor timeout=90 interval=30
  primitive dlm ocf:pacemaker:controld \
  op monitor interval=60s timeout=90s on-fail=ignore \
  op start interval=0 timeout=90
  primitive fs1 Filesystem \
  params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs
  primitive mariadb mysql \
  params config=/etc/my.cnf
  primitive sbd stonith:external/sbd \
  op monitor interval=15s timeout=60s
  primitive vg1 LVM \
  params volgrpname=cluvg1 exclusive=yes \
  op start timeout=10s interval=0 \
  op stop interval=0 timeout=10 \
  op monitor interval=10 timeout=30 on-fail=restart depth=0
  group base-group dlm clvm
  group rgroup cluIP vg1 fs1 mariadb \
  meta target-role=Started
  clone base-clone base-group \
  meta interleave=true target-role=Started
  property cib-bootstrap-options: \
  dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
  cluster-infrastructure=corosync \
  no-quorum-policy=ignore \
  last-lrm-refresh=1419514875 \
  cluster-name=xxx \
  stonith-enabled=true
  rsc_defaults rsc-options: \
  resource-stickiness=100
 
  --
  import this
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 esta es mi vida e me la vivo hasta que dios quiera
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
  import this




 --
 import this
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
esta es mi vida e me la vivo hasta que dios quiera

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
hi,

uploaded it here.

http://susepaste.org/45413433

thanks.

On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote:

 Ok, i attached the log file of one of the nodes.

 On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 please use pastebin and show your whole logs

 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  by the way.. just to note that.. for a normal testing (manual failover,
  rebooting the active node)... the cluster is working fine. I only
 encounter
  this error if I try to poweroff/shutoff the active node.
 
  On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com
 wrote:
 
  Hi.
 
 
  Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
  available (stopped)
  Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event: Operation
  vg1_monitor_0: not running (node=
  s1, call=23, rc=7, cib-update=40, confirmed=true)
  Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
 action
  9: monitor fs1_monitor_0 on
  s1 (local)
  Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
 action
  16: monitor vg1_monitor_0 on
   s2
  Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device
  [/dev/mapper/cluvg1-clulv1]. Ex
  pected /dev/??? to exist
 
 
  from the LVM agent, it checked if the volume is already available.. and
  will raise the above error if not. But, I don't see that it tries to
  activate it before raising the VG. Perhaps, it assumes that the VG is
  already activated... so, I'm not sure who should be activating it
 (should
  it be LVM?).
 
 
   if [ $rc -ne 0 ]; then
  ocf_log $loglevel LVM Volume $1 is not available
  (stopped)
  rc=$OCF_NOT_RUNNING
  else
  case $(get_vg_mode) in
  1) # exclusive with tagging.
  # If vg is running, make sure the correct tag
 is
  present. Otherwise we
  # can not guarantee exclusive activation.
  if ! check_tags; then
  ocf_exit_reason WARNING:
  $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\
 
  On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com
  wrote:
 
  logs?
 
  2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   Hi,
  
   just want to ask regarding the LVM resource agent on
 pacemaker/corosync.
  
   I setup 2 nodes cluster (opensuse13.2 -- my config below). The
 cluster
   works as expected, like doing a manual failover (via crm resource
 move),
   and automatic failover (by rebooting the active node for instance).
  But, if
   i try to just shutoff the active node (it's a VM, so I can do a
   poweroff). The resources won't be able to failover to the passive
 node.
   when I did an investigation, it's due to an LVM resource not
 starting
   (specifically, the VG). I found out that the LVM resource won't try
 to
   activate the volume group in the passive node. Is this an expected
   behaviour?
  
   what I really expect is that, in the event that the active node be
  shutoff
   (by a power outage for instance), all resources should be failover
   automatically to the passive. LVM should re-activate the VG.
  
  
   here's my config.
  
   node 1: s1
   node 2: s2
   primitive cluIP IPaddr2 \
   params ip=192.168.13.200 cidr_netmask=32 \
   op monitor interval=30s
   primitive clvm ocf:lvm2:clvmd \
   params daemon_timeout=30 \
   op monitor timeout=90 interval=30
   primitive dlm ocf:pacemaker:controld \
   op monitor interval=60s timeout=90s on-fail=ignore \
   op start interval=0 timeout=90
   primitive fs1 Filesystem \
   params device=/dev/mapper/cluvg1-clulv1 directory=/data
 fstype=btrfs
   primitive mariadb mysql \
   params config=/etc/my.cnf
   primitive sbd stonith:external/sbd \
   op monitor interval=15s timeout=60s
   primitive vg1 LVM \
   params volgrpname=cluvg1 exclusive=yes \
   op start timeout=10s interval=0 \
   op stop interval=0 timeout=10 \
   op monitor interval=10 timeout=30 on-fail=restart depth=0
   group base-group dlm clvm
   group rgroup cluIP vg1 fs1 mariadb \
   meta target-role=Started
   clone base-clone base-group \
   meta interleave=true target-role=Started
   property cib-bootstrap-options: \
   dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
   cluster-infrastructure=corosync \
   no-quorum-policy=ignore \
   last-lrm-refresh=1419514875 \
   cluster-name=xxx \
   stonith-enabled=true
   rsc_defaults rsc-options: \
   resource-stickiness=100
  
   --
   import this
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
 
 
  --
  esta es mi vida e me la vivo hasta que dios quiera
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
Sorry,

But your paste is empty.

2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 hi,

 uploaded it here.

 http://susepaste.org/45413433

 thanks.

 On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com wrote:

 Ok, i attached the log file of one of the nodes.

 On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 please use pastebin and show your whole logs

 2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  by the way.. just to note that.. for a normal testing (manual failover,
  rebooting the active node)... the cluster is working fine. I only
 encounter
  this error if I try to poweroff/shutoff the active node.
 
  On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com
 wrote:
 
  Hi.
 
 
  Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
  available (stopped)
  Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event: Operation
  vg1_monitor_0: not running (node=
  s1, call=23, rc=7, cib-update=40, confirmed=true)
  Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
 action
  9: monitor fs1_monitor_0 on
  s1 (local)
  Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
 action
  16: monitor vg1_monitor_0 on
   s2
  Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find device
  [/dev/mapper/cluvg1-clulv1]. Ex
  pected /dev/??? to exist
 
 
  from the LVM agent, it checked if the volume is already available.. and
  will raise the above error if not. But, I don't see that it tries to
  activate it before raising the VG. Perhaps, it assumes that the VG is
  already activated... so, I'm not sure who should be activating it
 (should
  it be LVM?).
 
 
   if [ $rc -ne 0 ]; then
  ocf_log $loglevel LVM Volume $1 is not available
  (stopped)
  rc=$OCF_NOT_RUNNING
  else
  case $(get_vg_mode) in
  1) # exclusive with tagging.
  # If vg is running, make sure the correct tag
 is
  present. Otherwise we
  # can not guarantee exclusive activation.
  if ! check_tags; then
  ocf_exit_reason WARNING:
  $OCF_RESKEY_volgrpname is active without the cluster tag, \$OUR_TAG\
 
  On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura emi2f...@gmail.com
  wrote:
 
  logs?
 
  2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   Hi,
  
   just want to ask regarding the LVM resource agent on
 pacemaker/corosync.
  
   I setup 2 nodes cluster (opensuse13.2 -- my config below). The
 cluster
   works as expected, like doing a manual failover (via crm resource
 move),
   and automatic failover (by rebooting the active node for instance).
  But, if
   i try to just shutoff the active node (it's a VM, so I can do a
   poweroff). The resources won't be able to failover to the passive
 node.
   when I did an investigation, it's due to an LVM resource not
 starting
   (specifically, the VG). I found out that the LVM resource won't try
 to
   activate the volume group in the passive node. Is this an expected
   behaviour?
  
   what I really expect is that, in the event that the active node be
  shutoff
   (by a power outage for instance), all resources should be failover
   automatically to the passive. LVM should re-activate the VG.
  
  
   here's my config.
  
   node 1: s1
   node 2: s2
   primitive cluIP IPaddr2 \
   params ip=192.168.13.200 cidr_netmask=32 \
   op monitor interval=30s
   primitive clvm ocf:lvm2:clvmd \
   params daemon_timeout=30 \
   op monitor timeout=90 interval=30
   primitive dlm ocf:pacemaker:controld \
   op monitor interval=60s timeout=90s on-fail=ignore \
   op start interval=0 timeout=90
   primitive fs1 Filesystem \
   params device=/dev/mapper/cluvg1-clulv1 directory=/data
 fstype=btrfs
   primitive mariadb mysql \
   params config=/etc/my.cnf
   primitive sbd stonith:external/sbd \
   op monitor interval=15s timeout=60s
   primitive vg1 LVM \
   params volgrpname=cluvg1 exclusive=yes \
   op start timeout=10s interval=0 \
   op stop interval=0 timeout=10 \
   op monitor interval=10 timeout=30 on-fail=restart depth=0
   group base-group dlm clvm
   group rgroup cluIP vg1 fs1 mariadb \
   meta target-role=Started
   clone base-clone base-group \
   meta interleave=true target-role=Started
   property cib-bootstrap-options: \
   dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
   cluster-infrastructure=corosync \
   no-quorum-policy=ignore \
   last-lrm-refresh=1419514875 \
   cluster-name=xxx \
   stonith-enabled=true
   rsc_defaults rsc-options: \
   resource-stickiness=100
  
   --
   import this
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
 
 
  --
  esta es mi vida e me la vivo hasta que dios quiera
  

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
ok, sorry for that.. please use this instead.

http://pastebin.centos.org/14771/

thanks.

On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote:

 Sorry,

 But your paste is empty.

 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  hi,
 
  uploaded it here.
 
  http://susepaste.org/45413433
 
  thanks.
 
  On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com
 wrote:
 
  Ok, i attached the log file of one of the nodes.
 
  On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com
  wrote:
 
  please use pastebin and show your whole logs
 
  2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   by the way.. just to note that.. for a normal testing (manual
 failover,
   rebooting the active node)... the cluster is working fine. I only
  encounter
   this error if I try to poweroff/shutoff the active node.
  
   On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com
  wrote:
  
   Hi.
  
  
   Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
   available (stopped)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event:
 Operation
   vg1_monitor_0: not running (node=
   s1, call=23, rc=7, cib-update=40, confirmed=true)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
  action
   9: monitor fs1_monitor_0 on
   s1 (local)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
  action
   16: monitor vg1_monitor_0 on
s2
   Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find
 device
   [/dev/mapper/cluvg1-clulv1]. Ex
   pected /dev/??? to exist
  
  
   from the LVM agent, it checked if the volume is already available..
 and
   will raise the above error if not. But, I don't see that it tries to
   activate it before raising the VG. Perhaps, it assumes that the VG
 is
   already activated... so, I'm not sure who should be activating it
  (should
   it be LVM?).
  
  
if [ $rc -ne 0 ]; then
   ocf_log $loglevel LVM Volume $1 is not available
   (stopped)
   rc=$OCF_NOT_RUNNING
   else
   case $(get_vg_mode) in
   1) # exclusive with tagging.
   # If vg is running, make sure the correct
 tag
  is
   present. Otherwise we
   # can not guarantee exclusive activation.
   if ! check_tags; then
   ocf_exit_reason WARNING:
   $OCF_RESKEY_volgrpname is active without the cluster tag,
 \$OUR_TAG\
  
   On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura 
 emi2f...@gmail.com
   wrote:
  
   logs?
  
   2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
Hi,
   
just want to ask regarding the LVM resource agent on
  pacemaker/corosync.
   
I setup 2 nodes cluster (opensuse13.2 -- my config below). The
  cluster
works as expected, like doing a manual failover (via crm resource
  move),
and automatic failover (by rebooting the active node for
 instance).
   But, if
i try to just shutoff the active node (it's a VM, so I can do a
poweroff). The resources won't be able to failover to the passive
  node.
when I did an investigation, it's due to an LVM resource not
  starting
(specifically, the VG). I found out that the LVM resource won't
 try
  to
activate the volume group in the passive node. Is this an
 expected
behaviour?
   
what I really expect is that, in the event that the active node
 be
   shutoff
(by a power outage for instance), all resources should be
 failover
automatically to the passive. LVM should re-activate the VG.
   
   
here's my config.
   
node 1: s1
node 2: s2
primitive cluIP IPaddr2 \
params ip=192.168.13.200 cidr_netmask=32 \
op monitor interval=30s
primitive clvm ocf:lvm2:clvmd \
params daemon_timeout=30 \
op monitor timeout=90 interval=30
primitive dlm ocf:pacemaker:controld \
op monitor interval=60s timeout=90s on-fail=ignore \
op start interval=0 timeout=90
primitive fs1 Filesystem \
params device=/dev/mapper/cluvg1-clulv1 directory=/data
  fstype=btrfs
primitive mariadb mysql \
params config=/etc/my.cnf
primitive sbd stonith:external/sbd \
op monitor interval=15s timeout=60s
primitive vg1 LVM \
params volgrpname=cluvg1 exclusive=yes \
op start timeout=10s interval=0 \
op stop interval=0 timeout=10 \
op monitor interval=10 timeout=30 on-fail=restart depth=0
group base-group dlm clvm
group rgroup cluIP vg1 fs1 mariadb \
meta target-role=Started
clone base-clone base-group \
meta interleave=true target-role=Started
property cib-bootstrap-options: \
dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
last-lrm-refresh=1419514875 \
cluster-name=xxx \
stonith-enabled=true
rsc_defaults rsc-options: \

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
Hi,

You have  a problem with the cluster stonithd:error: crm_abort:
crm_glib_handler: Forked child 6186 to record non-fatal assert at
logging.c:73 

Try to post your cluster version(packages), maybe someone can tell you
if this is a known bug or new.



2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 ok, sorry for that.. please use this instead.

 http://pastebin.centos.org/14771/

 thanks.

 On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote:

 Sorry,

 But your paste is empty.

 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  hi,
 
  uploaded it here.
 
  http://susepaste.org/45413433
 
  thanks.
 
  On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com
 wrote:
 
  Ok, i attached the log file of one of the nodes.
 
  On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com
  wrote:
 
  please use pastebin and show your whole logs
 
  2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   by the way.. just to note that.. for a normal testing (manual
 failover,
   rebooting the active node)... the cluster is working fine. I only
  encounter
   this error if I try to poweroff/shutoff the active node.
  
   On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com
  wrote:
  
   Hi.
  
  
   Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
   available (stopped)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event:
 Operation
   vg1_monitor_0: not running (node=
   s1, call=23, rc=7, cib-update=40, confirmed=true)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
  action
   9: monitor fs1_monitor_0 on
   s1 (local)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
  action
   16: monitor vg1_monitor_0 on
s2
   Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find
 device
   [/dev/mapper/cluvg1-clulv1]. Ex
   pected /dev/??? to exist
  
  
   from the LVM agent, it checked if the volume is already available..
 and
   will raise the above error if not. But, I don't see that it tries to
   activate it before raising the VG. Perhaps, it assumes that the VG
 is
   already activated... so, I'm not sure who should be activating it
  (should
   it be LVM?).
  
  
if [ $rc -ne 0 ]; then
   ocf_log $loglevel LVM Volume $1 is not available
   (stopped)
   rc=$OCF_NOT_RUNNING
   else
   case $(get_vg_mode) in
   1) # exclusive with tagging.
   # If vg is running, make sure the correct
 tag
  is
   present. Otherwise we
   # can not guarantee exclusive activation.
   if ! check_tags; then
   ocf_exit_reason WARNING:
   $OCF_RESKEY_volgrpname is active without the cluster tag,
 \$OUR_TAG\
  
   On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura 
 emi2f...@gmail.com
   wrote:
  
   logs?
  
   2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
Hi,
   
just want to ask regarding the LVM resource agent on
  pacemaker/corosync.
   
I setup 2 nodes cluster (opensuse13.2 -- my config below). The
  cluster
works as expected, like doing a manual failover (via crm resource
  move),
and automatic failover (by rebooting the active node for
 instance).
   But, if
i try to just shutoff the active node (it's a VM, so I can do a
poweroff). The resources won't be able to failover to the passive
  node.
when I did an investigation, it's due to an LVM resource not
  starting
(specifically, the VG). I found out that the LVM resource won't
 try
  to
activate the volume group in the passive node. Is this an
 expected
behaviour?
   
what I really expect is that, in the event that the active node
 be
   shutoff
(by a power outage for instance), all resources should be
 failover
automatically to the passive. LVM should re-activate the VG.
   
   
here's my config.
   
node 1: s1
node 2: s2
primitive cluIP IPaddr2 \
params ip=192.168.13.200 cidr_netmask=32 \
op monitor interval=30s
primitive clvm ocf:lvm2:clvmd \
params daemon_timeout=30 \
op monitor timeout=90 interval=30
primitive dlm ocf:pacemaker:controld \
op monitor interval=60s timeout=90s on-fail=ignore \
op start interval=0 timeout=90
primitive fs1 Filesystem \
params device=/dev/mapper/cluvg1-clulv1 directory=/data
  fstype=btrfs
primitive mariadb mysql \
params config=/etc/my.cnf
primitive sbd stonith:external/sbd \
op monitor interval=15s timeout=60s
primitive vg1 LVM \
params volgrpname=cluvg1 exclusive=yes \
op start timeout=10s interval=0 \
op stop interval=0 timeout=10 \
op monitor interval=10 timeout=30 on-fail=restart depth=0
group base-group dlm clvm
group rgroup cluIP vg1 fs1 mariadb \
meta target-role=Started
clone base-clone base-group \
meta 

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error:
Permission deniedPermission deniedI/O warning : failed to load
external entity /var/lib/pacemaker/cib/cib.xml
Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot
link /var/lib/pacemaker/cib/cib.xml to
/var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1)

2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com:
 Hi,

 You have  a problem with the cluster stonithd:error: crm_abort:
 crm_glib_handler: Forked child 6186 to record non-fatal assert at
 logging.c:73 

 Try to post your cluster version(packages), maybe someone can tell you
 if this is a known bug or new.



 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 ok, sorry for that.. please use this instead.

 http://pastebin.centos.org/14771/

 thanks.

 On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com wrote:

 Sorry,

 But your paste is empty.

 2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  hi,
 
  uploaded it here.
 
  http://susepaste.org/45413433
 
  thanks.
 
  On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com
 wrote:
 
  Ok, i attached the log file of one of the nodes.
 
  On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com
  wrote:
 
  please use pastebin and show your whole logs
 
  2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   by the way.. just to note that.. for a normal testing (manual
 failover,
   rebooting the active node)... the cluster is working fine. I only
  encounter
   this error if I try to poweroff/shutoff the active node.
  
   On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao marlon.g...@gmail.com
  wrote:
  
   Hi.
  
  
   Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is not
   available (stopped)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event:
 Operation
   vg1_monitor_0: not running (node=
   s1, call=23, rc=7, cib-update=40, confirmed=true)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
  action
   9: monitor fs1_monitor_0 on
   s1 (local)
   Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command: Initiating
  action
   16: monitor vg1_monitor_0 on
s2
   Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find
 device
   [/dev/mapper/cluvg1-clulv1]. Ex
   pected /dev/??? to exist
  
  
   from the LVM agent, it checked if the volume is already available..
 and
   will raise the above error if not. But, I don't see that it tries to
   activate it before raising the VG. Perhaps, it assumes that the VG
 is
   already activated... so, I'm not sure who should be activating it
  (should
   it be LVM?).
  
  
if [ $rc -ne 0 ]; then
   ocf_log $loglevel LVM Volume $1 is not available
   (stopped)
   rc=$OCF_NOT_RUNNING
   else
   case $(get_vg_mode) in
   1) # exclusive with tagging.
   # If vg is running, make sure the correct
 tag
  is
   present. Otherwise we
   # can not guarantee exclusive activation.
   if ! check_tags; then
   ocf_exit_reason WARNING:
   $OCF_RESKEY_volgrpname is active without the cluster tag,
 \$OUR_TAG\
  
   On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura 
 emi2f...@gmail.com
   wrote:
  
   logs?
  
   2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
Hi,
   
just want to ask regarding the LVM resource agent on
  pacemaker/corosync.
   
I setup 2 nodes cluster (opensuse13.2 -- my config below). The
  cluster
works as expected, like doing a manual failover (via crm resource
  move),
and automatic failover (by rebooting the active node for
 instance).
   But, if
i try to just shutoff the active node (it's a VM, so I can do a
poweroff). The resources won't be able to failover to the passive
  node.
when I did an investigation, it's due to an LVM resource not
  starting
(specifically, the VG). I found out that the LVM resource won't
 try
  to
activate the volume group in the passive node. Is this an
 expected
behaviour?
   
what I really expect is that, in the event that the active node
 be
   shutoff
(by a power outage for instance), all resources should be
 failover
automatically to the passive. LVM should re-activate the VG.
   
   
here's my config.
   
node 1: s1
node 2: s2
primitive cluIP IPaddr2 \
params ip=192.168.13.200 cidr_netmask=32 \
op monitor interval=30s
primitive clvm ocf:lvm2:clvmd \
params daemon_timeout=30 \
op monitor timeout=90 interval=30
primitive dlm ocf:pacemaker:controld \
op monitor interval=60s timeout=90s on-fail=ignore \
op start interval=0 timeout=90
primitive fs1 Filesystem \
params device=/dev/mapper/cluvg1-clulv1 directory=/data
  fstype=btrfs
primitive mariadb mysql \
params config=/etc/my.cnf
primitive sbd 

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
hmm.. but as far as I can see, looks like that messages can still be
ignored. My original problem is that.. LVM resource agent doesn't try to
activate the VG on the passive node if the active node goes power off.

On Mon, Dec 29, 2014 at 5:33 PM, emmanuel segura emi2f...@gmail.com wrote:

 Hi,

 You have  a problem with the cluster stonithd:error: crm_abort:
 crm_glib_handler: Forked child 6186 to record non-fatal assert at
 logging.c:73 

 Try to post your cluster version(packages), maybe someone can tell you
 if this is a known bug or new.



 2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  ok, sorry for that.. please use this instead.
 
  http://pastebin.centos.org/14771/
 
  thanks.
 
  On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Sorry,
 
  But your paste is empty.
 
  2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   hi,
  
   uploaded it here.
  
   http://susepaste.org/45413433
  
   thanks.
  
   On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com
  wrote:
  
   Ok, i attached the log file of one of the nodes.
  
   On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura emi2f...@gmail.com
 
   wrote:
  
   please use pastebin and show your whole logs
  
   2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
by the way.. just to note that.. for a normal testing (manual
  failover,
rebooting the active node)... the cluster is working fine. I only
   encounter
this error if I try to poweroff/shutoff the active node.
   
On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao 
 marlon.g...@gmail.com
   wrote:
   
Hi.
   
   
Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1 is
 not
available (stopped)
Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event:
  Operation
vg1_monitor_0: not running (node=
s1, call=23, rc=7, cib-update=40, confirmed=true)
Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command:
 Initiating
   action
9: monitor fs1_monitor_0 on
s1 (local)
Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command:
 Initiating
   action
16: monitor vg1_monitor_0 on
 s2
Dec 29 13:47:16 s1 Filesystem(fs1)[1618]: WARNING: Couldn't find
  device
[/dev/mapper/cluvg1-clulv1]. Ex
pected /dev/??? to exist
   
   
from the LVM agent, it checked if the volume is already
 available..
  and
will raise the above error if not. But, I don't see that it
 tries to
activate it before raising the VG. Perhaps, it assumes that the
 VG
  is
already activated... so, I'm not sure who should be activating it
   (should
it be LVM?).
   
   
 if [ $rc -ne 0 ]; then
ocf_log $loglevel LVM Volume $1 is not available
(stopped)
rc=$OCF_NOT_RUNNING
else
case $(get_vg_mode) in
1) # exclusive with tagging.
# If vg is running, make sure the correct
  tag
   is
present. Otherwise we
# can not guarantee exclusive activation.
if ! check_tags; then
ocf_exit_reason WARNING:
$OCF_RESKEY_volgrpname is active without the cluster tag,
  \$OUR_TAG\
   
On Mon, Dec 29, 2014 at 3:36 PM, emmanuel segura 
  emi2f...@gmail.com
wrote:
   
logs?
   
2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 Hi,

 just want to ask regarding the LVM resource agent on
   pacemaker/corosync.

 I setup 2 nodes cluster (opensuse13.2 -- my config below). The
   cluster
 works as expected, like doing a manual failover (via crm
 resource
   move),
 and automatic failover (by rebooting the active node for
  instance).
But, if
 i try to just shutoff the active node (it's a VM, so I can
 do a
 poweroff). The resources won't be able to failover to the
 passive
   node.
 when I did an investigation, it's due to an LVM resource not
   starting
 (specifically, the VG). I found out that the LVM resource
 won't
  try
   to
 activate the volume group in the passive node. Is this an
  expected
 behaviour?

 what I really expect is that, in the event that the active
 node
  be
shutoff
 (by a power outage for instance), all resources should be
  failover
 automatically to the passive. LVM should re-activate the VG.


 here's my config.

 node 1: s1
 node 2: s2
 primitive cluIP IPaddr2 \
 params ip=192.168.13.200 cidr_netmask=32 \
 op monitor interval=30s
 primitive clvm ocf:lvm2:clvmd \
 params daemon_timeout=30 \
 op monitor timeout=90 interval=30
 primitive dlm ocf:pacemaker:controld \
 op monitor interval=60s timeout=90s on-fail=ignore \
 op start interval=0 timeout=90
 primitive fs1 Filesystem \
 params device=/dev/mapper/cluvg1-clulv1 directory=/data
   fstype=btrfs
 primitive mariadb mysql \
 

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
perhaps, we need to focus on this message. as mentioned.. the cluster is
working fine under normal circumstances. my only concern is that, LVM
resource agent doesn't try to re-activate the VG on the passive node when
the active node goes down ungracefully (powered off). Hence, it could not
mount the filesystems.. etc.


Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
sbd_monitor_0: not running (node=
s1, call=5, rc=7, cib-update=35, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
13: monitor dlm:0_monitor_0
on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
5: monitor dlm:1_monitor_0 o
n s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
dlm_monitor_0: not running (node=
s1, call=10, rc=7, cib-update=36, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
14: monitor clvm:0_monitor_0
 on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
6: monitor clvm:1_monitor_0
on s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
clvm_monitor_0: not running (node
=s1, call=15, rc=7, cib-update=37, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
15: monitor cluIP_monitor_0
on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
7: monitor cluIP_monitor_0 o
n s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
cluIP_monitor_0: not running (nod
e=s1, call=19, rc=7, cib-update=38, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
16: monitor vg1_monitor_0 on
 s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
8: monitor vg1_monitor_0 on
s1 (local)
Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not
available (stopped)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
vg1_monitor_0: not running (node=
s1, call=23, rc=7, cib-update=39, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
17: monitor fs1_monitor_0 on
 s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating action
9: monitor fs1_monitor_0 on
s1 (local)
Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device
[/dev/mapper/cluvg1-clulv1]. Ex
pected /dev/??? to exist
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
fs1_monitor_0: not running (node=
s1, call=27, rc=7, cib-update=40, confirmed=true)

On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com wrote:

 Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error:
 Permission deniedPermission deniedI/O warning : failed to load
 external entity /var/lib/pacemaker/cib/cib.xml
 Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot
 link /var/lib/pacemaker/cib/cib.xml to
 /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1)

 2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com:
  Hi,
 
  You have  a problem with the cluster stonithd:error: crm_abort:
  crm_glib_handler: Forked child 6186 to record non-fatal assert at
  logging.c:73 
 
  Try to post your cluster version(packages), maybe someone can tell you
  if this is a known bug or new.
 
 
 
  2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  ok, sorry for that.. please use this instead.
 
  http://pastebin.centos.org/14771/
 
  thanks.
 
  On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Sorry,
 
  But your paste is empty.
 
  2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   hi,
  
   uploaded it here.
  
   http://susepaste.org/45413433
  
   thanks.
  
   On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao marlon.g...@gmail.com
  wrote:
  
   Ok, i attached the log file of one of the nodes.
  
   On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura 
 emi2f...@gmail.com
   wrote:
  
   please use pastebin and show your whole logs
  
   2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
by the way.. just to note that.. for a normal testing (manual
  failover,
rebooting the active node)... the cluster is working fine. I only
   encounter
this error if I try to poweroff/shutoff the active node.
   
On Mon, Dec 29, 2014 at 4:05 PM, Marlon Guao 
 marlon.g...@gmail.com
   wrote:
   
Hi.
   
   
Dec 29 13:47:16 s1 LVM(vg1)[1601]: WARNING: LVM Volume cluvg1
 is not
available (stopped)
Dec 29 13:47:16 s1 crmd[1515]:   notice: process_lrm_event:
  Operation
vg1_monitor_0: not running (node=
s1, call=23, rc=7, cib-update=40, confirmed=true)
Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command:
 Initiating
   action
9: monitor fs1_monitor_0 on
s1 (local)
Dec 29 13:47:16 s1 crmd[1515]:   notice: te_rsc_command:
 Initiating
   action
16: monitor vg1_monitor_0 on
 s2
Dec 29 13:47:16 s1 

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
Hi,

ah yeah.. tried to poweroff the active node.. and tried pvscan on the
passive.. and yes.. it didn't worked --- it doesn't return to the shell.
So, the problem is on DLM?

On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote:

 Power off the active node and after one seconde try to use one lvm
 command, for example pvscan, if this command doesn't response is
 because dlm relay on cluster fencing, if the cluster fencing doesn't
 work the dlm state in blocked state.

 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  perhaps, we need to focus on this message. as mentioned.. the cluster is
  working fine under normal circumstances. my only concern is that, LVM
  resource agent doesn't try to re-activate the VG on the passive node when
  the active node goes down ungracefully (powered off). Hence, it could not
  mount the filesystems.. etc.
 
 
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  sbd_monitor_0: not running (node=
  s1, call=5, rc=7, cib-update=35, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  13: monitor dlm:0_monitor_0
  on s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  5: monitor dlm:1_monitor_0 o
  n s1 (local)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  dlm_monitor_0: not running (node=
  s1, call=10, rc=7, cib-update=36, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  14: monitor clvm:0_monitor_0
   on s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  6: monitor clvm:1_monitor_0
  on s1 (local)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  clvm_monitor_0: not running (node
  =s1, call=15, rc=7, cib-update=37, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  15: monitor cluIP_monitor_0
  on s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  7: monitor cluIP_monitor_0 o
  n s1 (local)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  cluIP_monitor_0: not running (nod
  e=s1, call=19, rc=7, cib-update=38, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  16: monitor vg1_monitor_0 on
   s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  8: monitor vg1_monitor_0 on
  s1 (local)
  Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not
  available (stopped)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  vg1_monitor_0: not running (node=
  s1, call=23, rc=7, cib-update=39, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  17: monitor fs1_monitor_0 on
   s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  9: monitor fs1_monitor_0 on
  s1 (local)
  Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device
  [/dev/mapper/cluvg1-clulv1]. Ex
  pected /dev/??? to exist
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  fs1_monitor_0: not running (node=
  s1, call=27, rc=7, cib-update=40, confirmed=true)
 
  On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error:
  Permission deniedPermission deniedI/O warning : failed to load
  external entity /var/lib/pacemaker/cib/cib.xml
  Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot
  link /var/lib/pacemaker/cib/cib.xml to
  /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1)
 
  2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com:
   Hi,
  
   You have  a problem with the cluster stonithd:error: crm_abort:
   crm_glib_handler: Forked child 6186 to record non-fatal assert at
   logging.c:73 
  
   Try to post your cluster version(packages), maybe someone can tell you
   if this is a known bug or new.
  
  
  
   2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   ok, sorry for that.. please use this instead.
  
   http://pastebin.centos.org/14771/
  
   thanks.
  
   On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com
 
  wrote:
  
   Sorry,
  
   But your paste is empty.
  
   2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
hi,
   
uploaded it here.
   
http://susepaste.org/45413433
   
thanks.
   
On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao 
 marlon.g...@gmail.com
   wrote:
   
Ok, i attached the log file of one of the nodes.
   
On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura 
  emi2f...@gmail.com
wrote:
   
please use pastebin and show your whole logs
   
2014-12-29 9:06 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 by the way.. just to note that.. for a normal testing (manual
   failover,
 rebooting the active node)... the cluster 

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
Dlm isn't the problem, but i think is your fencing, when you powered
off the active node, the dead remain in unclean state? can you show me
your sbd timeouts? sbd -d /dev/path_of_your_device dump

Thanks

2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 Hi,

 ah yeah.. tried to poweroff the active node.. and tried pvscan on the
 passive.. and yes.. it didn't worked --- it doesn't return to the shell.
 So, the problem is on DLM?

 On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com wrote:

 Power off the active node and after one seconde try to use one lvm
 command, for example pvscan, if this command doesn't response is
 because dlm relay on cluster fencing, if the cluster fencing doesn't
 work the dlm state in blocked state.

 2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  perhaps, we need to focus on this message. as mentioned.. the cluster is
  working fine under normal circumstances. my only concern is that, LVM
  resource agent doesn't try to re-activate the VG on the passive node when
  the active node goes down ungracefully (powered off). Hence, it could not
  mount the filesystems.. etc.
 
 
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  sbd_monitor_0: not running (node=
  s1, call=5, rc=7, cib-update=35, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  13: monitor dlm:0_monitor_0
  on s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  5: monitor dlm:1_monitor_0 o
  n s1 (local)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  dlm_monitor_0: not running (node=
  s1, call=10, rc=7, cib-update=36, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  14: monitor clvm:0_monitor_0
   on s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  6: monitor clvm:1_monitor_0
  on s1 (local)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  clvm_monitor_0: not running (node
  =s1, call=15, rc=7, cib-update=37, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  15: monitor cluIP_monitor_0
  on s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  7: monitor cluIP_monitor_0 o
  n s1 (local)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  cluIP_monitor_0: not running (nod
  e=s1, call=19, rc=7, cib-update=38, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  16: monitor vg1_monitor_0 on
   s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  8: monitor vg1_monitor_0 on
  s1 (local)
  Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not
  available (stopped)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  vg1_monitor_0: not running (node=
  s1, call=23, rc=7, cib-update=39, confirmed=true)
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  17: monitor fs1_monitor_0 on
   s2
  Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
 action
  9: monitor fs1_monitor_0 on
  s1 (local)
  Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find device
  [/dev/mapper/cluvg1-clulv1]. Ex
  pected /dev/??? to exist
  Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
  fs1_monitor_0: not running (node=
  s1, call=27, rc=7, cib-update=40, confirmed=true)
 
  On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error:
  Permission deniedPermission deniedI/O warning : failed to load
  external entity /var/lib/pacemaker/cib/cib.xml
  Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot
  link /var/lib/pacemaker/cib/cib.xml to
  /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1)
 
  2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com:
   Hi,
  
   You have  a problem with the cluster stonithd:error: crm_abort:
   crm_glib_handler: Forked child 6186 to record non-fatal assert at
   logging.c:73 
  
   Try to post your cluster version(packages), maybe someone can tell you
   if this is a known bug or new.
  
  
  
   2014-12-29 10:29 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   ok, sorry for that.. please use this instead.
  
   http://pastebin.centos.org/14771/
  
   thanks.
  
   On Mon, Dec 29, 2014 at 5:25 PM, emmanuel segura emi2f...@gmail.com
 
  wrote:
  
   Sorry,
  
   But your paste is empty.
  
   2014-12-29 10:19 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
hi,
   
uploaded it here.
   
http://susepaste.org/45413433
   
thanks.
   
On Mon, Dec 29, 2014 at 5:09 PM, Marlon Guao 
 marlon.g...@gmail.com
   wrote:
   
Ok, i attached the log file of one of the nodes.
   
On Mon, Dec 29, 2014 at 4:42 PM, emmanuel segura 
  

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
https://bugzilla.redhat.com/show_bug.cgi?id=1127289#c4
https://bugzilla.redhat.com/show_bug.cgi?id=1127289

2014-12-29 11:57 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 here it is..


 ==Dumping header on disk /dev/mapper/sbd
 Header version : 2.1
 UUID   : 36074673-f48e-4da2-b4ee-385e83e6abcc
 Number of slots: 255
 Sector size: 512
 Timeout (watchdog) : 5
 Timeout (allocate) : 2
 Timeout (loop) : 1
 Timeout (msgwait)  : 10

 On Mon, Dec 29, 2014 at 6:42 PM, emmanuel segura emi2f...@gmail.com wrote:

 Dlm isn't the problem, but i think is your fencing, when you powered
 off the active node, the dead remain in unclean state? can you show me
 your sbd timeouts? sbd -d /dev/path_of_your_device dump

 Thanks

 2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  Hi,
 
  ah yeah.. tried to poweroff the active node.. and tried pvscan on the
  passive.. and yes.. it didn't worked --- it doesn't return to the shell.
  So, the problem is on DLM?
 
  On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Power off the active node and after one seconde try to use one lvm
  command, for example pvscan, if this command doesn't response is
  because dlm relay on cluster fencing, if the cluster fencing doesn't
  work the dlm state in blocked state.
 
  2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   perhaps, we need to focus on this message. as mentioned.. the cluster
 is
   working fine under normal circumstances. my only concern is that, LVM
   resource agent doesn't try to re-activate the VG on the passive node
 when
   the active node goes down ungracefully (powered off). Hence, it could
 not
   mount the filesystems.. etc.
  
  
   Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
   sbd_monitor_0: not running (node=
   s1, call=5, rc=7, cib-update=35, confirmed=true)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   13: monitor dlm:0_monitor_0
   on s2
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   5: monitor dlm:1_monitor_0 o
   n s1 (local)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
   dlm_monitor_0: not running (node=
   s1, call=10, rc=7, cib-update=36, confirmed=true)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   14: monitor clvm:0_monitor_0
on s2
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   6: monitor clvm:1_monitor_0
   on s1 (local)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
   clvm_monitor_0: not running (node
   =s1, call=15, rc=7, cib-update=37, confirmed=true)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   15: monitor cluIP_monitor_0
   on s2
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   7: monitor cluIP_monitor_0 o
   n s1 (local)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
   cluIP_monitor_0: not running (nod
   e=s1, call=19, rc=7, cib-update=38, confirmed=true)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   16: monitor vg1_monitor_0 on
s2
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   8: monitor vg1_monitor_0 on
   s1 (local)
   Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is not
   available (stopped)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
   vg1_monitor_0: not running (node=
   s1, call=23, rc=7, cib-update=39, confirmed=true)
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   17: monitor fs1_monitor_0 on
s2
   Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command: Initiating
  action
   9: monitor fs1_monitor_0 on
   s1 (local)
   Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find
 device
   [/dev/mapper/cluvg1-clulv1]. Ex
   pected /dev/??? to exist
   Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event: Operation
   fs1_monitor_0: not running (node=
   s1, call=27, rc=7, cib-update=40, confirmed=true)
  
   On Mon, Dec 29, 2014 at 5:38 PM, emmanuel segura emi2f...@gmail.com
  wrote:
  
   Dec 27 15:38:00 s1 cib[1514]:error: crm_xml_err: XML Error:
   Permission deniedPermission deniedI/O warning : failed to load
   external entity /var/lib/pacemaker/cib/cib.xml
   Dec 27 15:38:00 s1 cib[1514]:error: write_cib_contents: Cannot
   link /var/lib/pacemaker/cib/cib.xml to
   /var/lib/pacemaker/cib/cib-0.raw: Operation not permitted (1)
  
   2014-12-29 10:33 GMT+01:00 emmanuel segura emi2f...@gmail.com:
Hi,
   
You have  a problem with the cluster stonithd:error: crm_abort:
crm_glib_handler: Forked child 6186 to record non-fatal assert at
logging.c:73 
   
Try to post your cluster version(packages), maybe someone can tell
 you
if this is a known bug or new.
   
   
   

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread Marlon Guao
looks like it's similar to this as well.

http://comments.gmane.org/gmane.linux.highavailability.pacemaker/22398

but, could it be because, clvm is not activating the vg on the passive
node, because it's waiting for quorum?

seeing this on the log as well.

Dec 29 21:18:09 s2 dlm_controld[1776]: 8544 fence work wait for quorum
Dec 29 21:18:12 s2 dlm_controld[1776]: 8547 clvmd wait for quorum



On Mon, Dec 29, 2014 at 9:24 PM, Marlon Guao marlon.g...@gmail.com wrote:

 interesting, i'm using the newer pacemaker version..

 pacemaker-1.1.12.git20140904.266d5c2-1.5.x86_64


 On Mon, Dec 29, 2014 at 8:11 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 https://bugzilla.redhat.com/show_bug.cgi?id=1127289#c4
 https://bugzilla.redhat.com/show_bug.cgi?id=1127289

 2014-12-29 11:57 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  here it is..
 
 
  ==Dumping header on disk /dev/mapper/sbd
  Header version : 2.1
  UUID   : 36074673-f48e-4da2-b4ee-385e83e6abcc
  Number of slots: 255
  Sector size: 512
  Timeout (watchdog) : 5
  Timeout (allocate) : 2
  Timeout (loop) : 1
  Timeout (msgwait)  : 10
 
  On Mon, Dec 29, 2014 at 6:42 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Dlm isn't the problem, but i think is your fencing, when you powered
  off the active node, the dead remain in unclean state? can you show me
  your sbd timeouts? sbd -d /dev/path_of_your_device dump
 
  Thanks
 
  2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   Hi,
  
   ah yeah.. tried to poweroff the active node.. and tried pvscan on the
   passive.. and yes.. it didn't worked --- it doesn't return to the
 shell.
   So, the problem is on DLM?
  
   On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com
 
  wrote:
  
   Power off the active node and after one seconde try to use one lvm
   command, for example pvscan, if this command doesn't response is
   because dlm relay on cluster fencing, if the cluster fencing doesn't
   work the dlm state in blocked state.
  
   2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
perhaps, we need to focus on this message. as mentioned.. the
 cluster
  is
working fine under normal circumstances. my only concern is that,
 LVM
resource agent doesn't try to re-activate the VG on the passive
 node
  when
the active node goes down ungracefully (powered off). Hence, it
 could
  not
mount the filesystems.. etc.
   
   
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
sbd_monitor_0: not running (node=
s1, call=5, rc=7, cib-update=35, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
13: monitor dlm:0_monitor_0
on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
5: monitor dlm:1_monitor_0 o
n s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
dlm_monitor_0: not running (node=
s1, call=10, rc=7, cib-update=36, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
14: monitor clvm:0_monitor_0
 on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
6: monitor clvm:1_monitor_0
on s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
clvm_monitor_0: not running (node
=s1, call=15, rc=7, cib-update=37, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
15: monitor cluIP_monitor_0
on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
7: monitor cluIP_monitor_0 o
n s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
cluIP_monitor_0: not running (nod
e=s1, call=19, rc=7, cib-update=38, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
16: monitor vg1_monitor_0 on
 s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
8: monitor vg1_monitor_0 on
s1 (local)
Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is
 not
available (stopped)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
vg1_monitor_0: not running (node=
s1, call=23, rc=7, cib-update=39, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
17: monitor fs1_monitor_0 on
 s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
9: monitor fs1_monitor_0 on
s1 (local)
Dec 29 17:12:26 s1 Filesystem(fs1)[1600]: WARNING: Couldn't find
  device
[/dev/mapper/cluvg1-clulv1]. Ex
pected /dev/??? to exist
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
fs1_monitor_0: not running (node=
s1, call=27, rc=7, cib-update=40, confirmed=true)
   
On Mon, Dec 

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-29 Thread emmanuel segura
you have quorum-policy=ignore, in the thread you posted:
Nov 24 09:52:10 nebula3 dlm_controld[6263]: 566 datastores wait for fencing
Nov 24 09:52:10 nebula3 dlm_controld[6263]: 566 clvmd wait for fencing
Nov 24 09:55:10 nebula3 dlm_controld[6263]: 747 fence status
1084811078 receive -125 from 1084811079
walltime 1416819310 local 747

{lvm}-{clvmd}-{dlm}-{fencing} = if fencing isn't working :) your
cluster will be broken.

2014-12-29 15:46 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 looks like it's similar to this as well.

 http://comments.gmane.org/gmane.linux.highavailability.pacemaker/22398

 but, could it be because, clvm is not activating the vg on the passive
 node, because it's waiting for quorum?

 seeing this on the log as well.

 Dec 29 21:18:09 s2 dlm_controld[1776]: 8544 fence work wait for quorum
 Dec 29 21:18:12 s2 dlm_controld[1776]: 8547 clvmd wait for quorum



 On Mon, Dec 29, 2014 at 9:24 PM, Marlon Guao marlon.g...@gmail.com wrote:

 interesting, i'm using the newer pacemaker version..

 pacemaker-1.1.12.git20140904.266d5c2-1.5.x86_64


 On Mon, Dec 29, 2014 at 8:11 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 https://bugzilla.redhat.com/show_bug.cgi?id=1127289#c4
 https://bugzilla.redhat.com/show_bug.cgi?id=1127289

 2014-12-29 11:57 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
  here it is..
 
 
  ==Dumping header on disk /dev/mapper/sbd
  Header version : 2.1
  UUID   : 36074673-f48e-4da2-b4ee-385e83e6abcc
  Number of slots: 255
  Sector size: 512
  Timeout (watchdog) : 5
  Timeout (allocate) : 2
  Timeout (loop) : 1
  Timeout (msgwait)  : 10
 
  On Mon, Dec 29, 2014 at 6:42 PM, emmanuel segura emi2f...@gmail.com
 wrote:
 
  Dlm isn't the problem, but i think is your fencing, when you powered
  off the active node, the dead remain in unclean state? can you show me
  your sbd timeouts? sbd -d /dev/path_of_your_device dump
 
  Thanks
 
  2014-12-29 11:02 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
   Hi,
  
   ah yeah.. tried to poweroff the active node.. and tried pvscan on the
   passive.. and yes.. it didn't worked --- it doesn't return to the
 shell.
   So, the problem is on DLM?
  
   On Mon, Dec 29, 2014 at 5:51 PM, emmanuel segura emi2f...@gmail.com
 
  wrote:
  
   Power off the active node and after one seconde try to use one lvm
   command, for example pvscan, if this command doesn't response is
   because dlm relay on cluster fencing, if the cluster fencing doesn't
   work the dlm state in blocked state.
  
   2014-12-29 10:43 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
perhaps, we need to focus on this message. as mentioned.. the
 cluster
  is
working fine under normal circumstances. my only concern is that,
 LVM
resource agent doesn't try to re-activate the VG on the passive
 node
  when
the active node goes down ungracefully (powered off). Hence, it
 could
  not
mount the filesystems.. etc.
   
   
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
sbd_monitor_0: not running (node=
s1, call=5, rc=7, cib-update=35, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
13: monitor dlm:0_monitor_0
on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
5: monitor dlm:1_monitor_0 o
n s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
dlm_monitor_0: not running (node=
s1, call=10, rc=7, cib-update=36, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
14: monitor clvm:0_monitor_0
 on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
6: monitor clvm:1_monitor_0
on s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
clvm_monitor_0: not running (node
=s1, call=15, rc=7, cib-update=37, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
15: monitor cluIP_monitor_0
on s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
7: monitor cluIP_monitor_0 o
n s1 (local)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
cluIP_monitor_0: not running (nod
e=s1, call=19, rc=7, cib-update=38, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
16: monitor vg1_monitor_0 on
 s2
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action
8: monitor vg1_monitor_0 on
s1 (local)
Dec 29 17:12:26 s1 LVM(vg1)[1583]: WARNING: LVM Volume cluvg1 is
 not
available (stopped)
Dec 29 17:12:26 s1 crmd[1495]:   notice: process_lrm_event:
 Operation
vg1_monitor_0: not running (node=
s1, call=23, rc=7, cib-update=39, confirmed=true)
Dec 29 17:12:26 s1 crmd[1495]:   notice: te_rsc_command:
 Initiating
   action

Re: [Linux-HA] pacemaker/heartbeat LVM

2014-12-28 Thread emmanuel segura
logs?

2014-12-29 6:54 GMT+01:00 Marlon Guao marlon.g...@gmail.com:
 Hi,

 just want to ask regarding the LVM resource agent on pacemaker/corosync.

 I setup 2 nodes cluster (opensuse13.2 -- my config below). The cluster
 works as expected, like doing a manual failover (via crm resource move),
 and automatic failover (by rebooting the active node for instance). But, if
 i try to just shutoff the active node (it's a VM, so I can do a
 poweroff). The resources won't be able to failover to the passive node.
 when I did an investigation, it's due to an LVM resource not starting
 (specifically, the VG). I found out that the LVM resource won't try to
 activate the volume group in the passive node. Is this an expected
 behaviour?

 what I really expect is that, in the event that the active node be shutoff
 (by a power outage for instance), all resources should be failover
 automatically to the passive. LVM should re-activate the VG.


 here's my config.

 node 1: s1
 node 2: s2
 primitive cluIP IPaddr2 \
 params ip=192.168.13.200 cidr_netmask=32 \
 op monitor interval=30s
 primitive clvm ocf:lvm2:clvmd \
 params daemon_timeout=30 \
 op monitor timeout=90 interval=30
 primitive dlm ocf:pacemaker:controld \
 op monitor interval=60s timeout=90s on-fail=ignore \
 op start interval=0 timeout=90
 primitive fs1 Filesystem \
 params device=/dev/mapper/cluvg1-clulv1 directory=/data fstype=btrfs
 primitive mariadb mysql \
 params config=/etc/my.cnf
 primitive sbd stonith:external/sbd \
 op monitor interval=15s timeout=60s
 primitive vg1 LVM \
 params volgrpname=cluvg1 exclusive=yes \
 op start timeout=10s interval=0 \
 op stop interval=0 timeout=10 \
 op monitor interval=10 timeout=30 on-fail=restart depth=0
 group base-group dlm clvm
 group rgroup cluIP vg1 fs1 mariadb \
 meta target-role=Started
 clone base-clone base-group \
 meta interleave=true target-role=Started
 property cib-bootstrap-options: \
 dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
 cluster-infrastructure=corosync \
 no-quorum-policy=ignore \
 last-lrm-refresh=1419514875 \
 cluster-name=xxx \
 stonith-enabled=true
 rsc_defaults rsc-options: \
 resource-stickiness=100

 --
 import this
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems