Re: [Linux-HA] DRBD does not switch resources to other node properly

Dejan Muhamedagic Thu, 16 Apr 2009 10:39:41 -0700

Hi,

On Thu, Apr 16, 2009 at 10:11:26AM -0700, Ethan Bannister wrote:
> 
> Perhaps someone may be able to give me a little insight on what I may be
> doing wrong.  I would like to have DRBD promote on secondary machine when
> the Ethernet connection to the initiator on my SAN goes down.  When I pull
> the cable or bring eth0 down which IPaddr resides on, this is what crm_mon
> shows me soon after:
> 
> ============
> Last updated: Thu Apr 16 12:38:36 2009
> Current DC: init2.mydomain.com (1d3814dc-7928-4beb-99f6-c7ade09056a5) -
> partition with quorum
> Version: 1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9
> 4 Nodes configured, unknown expected votes
> 8 Resources configured.
> ============
> 
> Online: [ san2.mydomain.com init2.mydomain.com init1.mydomain.com ]
> OFFLINE: [ san1.mydomain.com ]
> 
> Resource Group: G_Target
>     R_IP_Target (ocf::heartbeat:IPaddr2):     Started san2.mydomain.com
>     R_tgtd    (ocf::acs:tgtdra):      Started san2.mydomain.com
> Master/Slave Set: ms-drbd0
>         Masters: [ san2.mydomain.com ]
>         Stopped: [ drbd0:0 ]     <---------- correct
> Master/Slave Set: ms-drbd1
>         Masters: [ san2.mydomain.com ]
>         Stopped: [ drbd1:1 ]     <---------- incorrect
> Master/Slave Set: ms-drbd2
>         Masters: [ san2.mydomain.com ]
>         Stopped: [ drbd2:0 ]     <---------- correct
> Clone Set: pingd
>         Started: [ init1.mydomain.com init2.mydomain.com san2.mydomain.com ]
>         Stopped: [ R_pingd:2 ]
> 
> Failed actions:
>     drbd1:1_promote_0 (node=san2.mydomain.com, call=43, rc=1,
> status=complete): unknown error


Does drbd report any error in the logs (look form lrmd.*drbd)?
This looks like a resource or a drbd RA issue.

Thanks,

Dejan

> As you can see, drbd0 and drbd2 promote with no issues.  But drbd1 is not
> promoting properly.  I have checked my constraints, and I have tweaked out
> the start-delay settings, but nothing happens the way I would like.  I have
> two initiators for redundancy as well.  But I want the initiator to stay up
> if the network goes down on either target.  This has been puzzling me for
> some time now.  Any help would be greatly appreciated.

> Here is what I have for a crm cli config:
> 
> node $id="cee46f54-d517-4e4d-b0b8-3076fbc5751b" san2.mydomain.com \
>         attributes standby="off"
> node $id="bde24914-1235-4dc4-8686-f05fd9e6a35e" san1.mydomain.com \
>         attributes standby="off"
> node $id="1d3814dc-7928-4beb-99f6-c7ade09056a5" init2.mydomain.com \
>         attributes standby="off"
> node $id="a058cd72-b27e-4593-ac7e-d79db0709c15" init1.mydomain.com \
>         attributes standby="off"
> primitive R_IP_Target ocf:heartbeat:IPaddr2 \
>         params ip="192.168.*.*" \
>         params nic="eth0" \
>         params iflabel="1" \
>         op monitor interval="30s"
> primitive R_tgtd ocf:acs:tgtdra \
>         op monitor interval="30s" \
>         op start interval="0" timeout="30s" start-delay="2s"
> primitive R_IP_Init ocf:heartbeat:IPaddr2 \
>         params ip="192.168.*.*" \
>         params nic="eth0" \
>         params iflabel="1" \
>         op monitor interval="30s"
> primitive R_iscsi ocf:heartbeat:iscsi \
>         params target="target1.mydomain.com:san.targets" \
>         params portal="192.168.*.*" \
>         op monitor interval="30s" \
>         op start interval="0" timeout="30s" start-delay="5s" \
>         meta is-managed="true"
> primitive R_LVM ocf:heartbeat:LVM \
>         params volgrpname="VolGroup01" \
>         op monitor interval="30s" \
>         op start interval="0" timeout="30s" start-delay="5s" \
>         meta is-managed="true"
> primitive R_Filesystem ocf:heartbeat:Filesystem \
>         params device="/dev/VolGroup01/LogVol00" \
>         params directory="/san_targets/www" \
>         params fstype="ext3" \
>         op monitor interval="30s" \
>         op start interval="0" timeout="30s" start-delay="5s"
> primitive R_NFS ocf:heartbeat:nfsserver \
>         params nfs_init_script="/etc/init.d/nfs" \
>         params nfs_notify_cmd="/sbin/rpc.statd" \
>         params nfs_shared_infodir="/san_targets/www/nfsinfo" \
>         op monitor interval="30s"
> primitive drbd0 ocf:heartbeat:drbd \
>         params drbd_resource="drbd0" \
>         op monitor interval="29s" role="Master" timeout="30s" \
>         op monitor interval="30s" role="Slave" timeout="30s" \
>         op start interval="0" timeout="30s" start-delay="10s"
> primitive drbd1 ocf:heartbeat:drbd \
>         params drbd_resource="drbd1" \
>         op monitor interval="29s" role="Master" timeout="30s" \
>         op monitor interval="30s" role="Slave" timeout="30s" \
>         op start interval="0" timeout="30s" start-delay="10s"
> primitive drbd2 ocf:heartbeat:drbd \
>         params drbd_resource="drbd2" \
>         op monitor interval="29s" role="Master" timeout="30s" \
>         op monitor interval="30s" role="Slave" timeout="30s" \
>         op start interval="0" timeout="30s" start-delay="10s"
> primitive R_pingd ocf:pacemaker:pingd
> primitive R_Failover_Alert_Init ocf:heartbeat:MailTo2 \
>         params sender="[email protected]" \
>         params email="[email protected],[email protected]" \
>         params subject="ACS Init"
> primitive R_Failover_Alert_Target ocf:heartbeat:MailTo2 \
>         params sender="[email protected]" \
>         params email="[email protected],[email protected]" \
>         params subject="ACS San"
> group G_Target R_IP_Target R_tgtd \
>         meta target-role="Started"
> group G_Init R_IP_Init R_iscsi R_LVM R_Filesystem R_NFS \
>         meta target-role="Stopped"
> ms ms-drbd0 drbd0 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> ms ms-drbd1 drbd1 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> ms ms-drbd2 drbd2 \
>         meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> clone pingd R_pingd \
>         meta target-role="Started"
> clone Failover_Alert_Init R_Failover_Alert_Init \
>         meta clone-max="2" target-role="Stopped"
> clone Failover_Alert_Target R_Failover_Alert_Target \
>         meta clone-max="2" target-role="Stopped"
> location pingd-node-1 pingd 500: init1.mydomain.com
> location pingd-node-2 pingd 500: init2.mydomain.com
> location pingd-node-3 pingd 500: san1.mydomain.com
> location pingd-node-4 pingd 500: san2.mydomain.com
> location ms-drbd0-pref-1 ms-drbd0 200: san1.mydomain.com
> location ms-drbd0-pref-2 ms-drbd0 100: san2.mydomain.com
> location ms-drbd1-pref-1 ms-drbd1 200: san1.mydomain.com
> location ms-drbd1-pref-2 ms-drbd1 100: san2.mydomain.com
> location ms-drbd2-pref-1 ms-drbd2 200: san1.mydomain.com
> location ms-drbd2-pref-2 ms-drbd2 100: san2.mydomain.com
> location G_Target-pref-1 G_Target 200: san1.mydomain.com
> location G_Target-pref-2 G_Target 100: san2.mydomain.com
> location G_Init-pref-1 G_Init 200: init1.mydomain.com
> location G_Init-pref-2 G_Init 100: init2.mydomain.com
> location Failover-Alert-node1 Failover_Alert_Init 200: init1.mydomain.com
> location Failover-Alert-node2 Failover_Alert_Init 100: init2.mydomain.com
> location Failover-Alert-node3 Failover_Alert_Target 200: san1.mydomain.com
> location Failover-Alert-node4 Failover_Alert_Target 100: san2.mydomain.com
> colocation G_Target-on-ms-drbd0 inf: G_Target ms-drbd0:Master
> colocation G_Target-on-ms-drbd1 inf: G_Target ms-drbd1:Master
> colocation G_Target-on-ms-drbd2 inf: G_Target ms-drbd2:Master
> order ms-drbd0-before-ms-drbd1 inf: ms-drbd0:promote ms-drbd1:promote
> order ms-drbd1-before-ms-drbd2 inf: ms-drbd1:promote ms-drbd2:promote
> order ms-drbd2-before-G_Target inf: ms-drbd2:promote G_Target:start
> order G_Target-before-G_Init inf: G_Target:start G_Init:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9" \
>         stonith-enabled="false" \
>         stonith-action="reboot" \
>         stop-orphan-resources="true" \
>         stop-orphan-actions="true" \
>         symmetric-cluster="false" \
>         last-lrm-refresh="1239899583" \
>         default-resource-stickiness="INFINITY"
> 
> Any ideas?
> -- 
> View this message in context: 
> http://www.nabble.com/DRBD-does-not-switch-resources-to-other-node-properly-tp23082432p23082432.html
> Sent from the Linux-HA mailing list archive at Nabble.com.
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] DRBD does not switch resources to other node properly

Reply via email to