I am attempting to take a look at /var/log/mesages to see what may be going on... This is something that caught my eye on san2:
Apr 16 14:04:39 san2 lrmd: [12984]: info: rsc:drbd1:1: promote Apr 16 14:04:40 san2 crmd: [12987]: info: process_lrm_event: LRM operation drbd0:1_monitor_29000 (call=107, rc=8, cib-update=133, confirmed=false) complete master Apr 16 14:04:41 san2 lrmd: [12984]: info: RA output: (drbd1:1:promote:stdout) /dev/drbd1: State change failed: (-1) Multiple primaries not allowed by config Command 'drbdsetup /dev/drbd1 primary' terminated with exit code 11 Apr 16 14:04:41 san2 drbd[6372]: [6459]: ERROR: drbd1 promote: Not primary despite drbdadm call. Apr 16 14:04:41 san2 crmd: [12987]: info: process_lrm_event: LRM operation drbd1:1_promote_0 (call=108, rc=1, cib-update=134, confirmed=true) complete unknown error Apr 16 14:04:41 san2 kernel: drbd1: peer( Primary -> Secondary ) Apr 16 14:04:42 san2 kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) Apr 16 14:04:42 san2 kernel: drbd1: Writing meta data super block now. Apr 16 14:04:42 san2 kernel: drbd1: asender terminated Apr 16 14:04:42 san2 kernel: drbd1: Terminating asender thread Apr 16 14:04:42 san2 kernel: drbd1: tl_clear() Apr 16 14:04:42 san2 kernel: drbd1: Connection closed Apr 16 14:04:42 san2 kernel: drbd1: conn( TearDown -> Unconnected ) Apr 16 14:04:42 san2 kernel: drbd1: receiver terminated Apr 16 14:04:42 san2 kernel: drbd1: Restarting receiver thread Apr 16 14:04:42 san2 kernel: drbd1: receiver (re)started Apr 16 14:04:42 san2 kernel: drbd1: conn( Unconnected -> WFConnection ) Apr 16 14:04:42 san2 crmd: [12987]: info: do_lrm_rsc_op: Performing key=174:43:0:90b5d1cc-a955-48e8-a1a6-7a2674a8c783 op=drbd1:1_notify_0 ) Apr 16 14:04:42 san2 lrmd: [12984]: info: rsc:drbd1:1: notify Apr 16 14:04:42 san2 crm_master: [6492]: info: Invoked: /usr/sbin/crm_master -l reboot -v 10 Apr 16 14:04:42 san2 attrd: [12986]: info: attrd_trigger_update: Sending flush op to all hosts for: master-drbd1:1 Apr 16 14:04:42 san2 attrd: [12986]: info: attrd_perform_update: Sent update 118: master-drbd1:1=10 Apr 16 14:04:42 san2 lrmd: [12984]: info: RA output: (drbd1:1:notify:stdout) 0 Trying master-drbd1:1=10 update via attrd Apr 16 14:04:42 san2 crmd: [12987]: info: process_lrm_event: LRM operation drbd1:1_notify_0 (call=109, rc=0, cib-update=135, confirmed=true) complete ok Apr 16 14:04:43 san2 crmd: [12987]: info: do_lrm_rsc_op: Performing key=170:44:0:90b5d1cc-a955-48e8-a1a6-7a2674a8c783 op=drbd1:1_notify_0 ) Apr 16 14:04:43 san2 lrmd: [12984]: info: rsc:drbd1:1: notify I take it that it is not demoting on san1 for some odd reason... It exits with a code 11 and states that there is dual primary's are not allowed, which is true. But the thing that I can't get past is why it is only doing this to drbd1 and not drbd0 or drbd2.. I just upgraded to the new CentOS 5.3 and I am using the most up to date version of pacemaker and heartbeat. I am also using the RA for drbd that came with the heartbeat package. Is there another log that may give me more insight? Dejan Muhamedagic wrote: > > Hi, > > On Thu, Apr 16, 2009 at 10:11:26AM -0700, Ethan Bannister wrote: >> >> Perhaps someone may be able to give me a little insight on what I may be >> doing wrong. I would like to have DRBD promote on secondary machine when >> the Ethernet connection to the initiator on my SAN goes down. When I >> pull >> the cable or bring eth0 down which IPaddr resides on, this is what >> crm_mon >> shows me soon after: >> >> ============ >> Last updated: Thu Apr 16 12:38:36 2009 >> Current DC: init2.mydomain.com (1d3814dc-7928-4beb-99f6-c7ade09056a5) - >> partition with quorum >> Version: 1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9 >> 4 Nodes configured, unknown expected votes >> 8 Resources configured. >> ============ >> >> Online: [ san2.mydomain.com init2.mydomain.com init1.mydomain.com ] >> OFFLINE: [ san1.mydomain.com ] >> >> Resource Group: G_Target >> R_IP_Target (ocf::heartbeat:IPaddr2): Started san2.mydomain.com >> R_tgtd (ocf::acs:tgtdra): Started san2.mydomain.com >> Master/Slave Set: ms-drbd0 >> Masters: [ san2.mydomain.com ] >> Stopped: [ drbd0:0 ] <---------- correct >> Master/Slave Set: ms-drbd1 >> Masters: [ san2.mydomain.com ] >> Stopped: [ drbd1:1 ] <---------- incorrect >> Master/Slave Set: ms-drbd2 >> Masters: [ san2.mydomain.com ] >> Stopped: [ drbd2:0 ] <---------- correct >> Clone Set: pingd >> Started: [ init1.mydomain.com init2.mydomain.com >> san2.mydomain.com ] >> Stopped: [ R_pingd:2 ] >> >> Failed actions: >> drbd1:1_promote_0 (node=san2.mydomain.com, call=43, rc=1, >> status=complete): unknown error > > Does drbd report any error in the logs (look form lrmd.*drbd)? > This looks like a resource or a drbd RA issue. > > Thanks, > > Dejan > >> As you can see, drbd0 and drbd2 promote with no issues. But drbd1 is not >> promoting properly. I have checked my constraints, and I have tweaked >> out >> the start-delay settings, but nothing happens the way I would like. I >> have >> two initiators for redundancy as well. But I want the initiator to stay >> up >> if the network goes down on either target. This has been puzzling me for >> some time now. Any help would be greatly appreciated. > >> Here is what I have for a crm cli config: >> >> node $id="cee46f54-d517-4e4d-b0b8-3076fbc5751b" san2.mydomain.com \ >> attributes standby="off" >> node $id="bde24914-1235-4dc4-8686-f05fd9e6a35e" san1.mydomain.com \ >> attributes standby="off" >> node $id="1d3814dc-7928-4beb-99f6-c7ade09056a5" init2.mydomain.com \ >> attributes standby="off" >> node $id="a058cd72-b27e-4593-ac7e-d79db0709c15" init1.mydomain.com \ >> attributes standby="off" >> primitive R_IP_Target ocf:heartbeat:IPaddr2 \ >> params ip="192.168.*.*" \ >> params nic="eth0" \ >> params iflabel="1" \ >> op monitor interval="30s" >> primitive R_tgtd ocf:acs:tgtdra \ >> op monitor interval="30s" \ >> op start interval="0" timeout="30s" start-delay="2s" >> primitive R_IP_Init ocf:heartbeat:IPaddr2 \ >> params ip="192.168.*.*" \ >> params nic="eth0" \ >> params iflabel="1" \ >> op monitor interval="30s" >> primitive R_iscsi ocf:heartbeat:iscsi \ >> params target="target1.mydomain.com:san.targets" \ >> params portal="192.168.*.*" \ >> op monitor interval="30s" \ >> op start interval="0" timeout="30s" start-delay="5s" \ >> meta is-managed="true" >> primitive R_LVM ocf:heartbeat:LVM \ >> params volgrpname="VolGroup01" \ >> op monitor interval="30s" \ >> op start interval="0" timeout="30s" start-delay="5s" \ >> meta is-managed="true" >> primitive R_Filesystem ocf:heartbeat:Filesystem \ >> params device="/dev/VolGroup01/LogVol00" \ >> params directory="/san_targets/www" \ >> params fstype="ext3" \ >> op monitor interval="30s" \ >> op start interval="0" timeout="30s" start-delay="5s" >> primitive R_NFS ocf:heartbeat:nfsserver \ >> params nfs_init_script="/etc/init.d/nfs" \ >> params nfs_notify_cmd="/sbin/rpc.statd" \ >> params nfs_shared_infodir="/san_targets/www/nfsinfo" \ >> op monitor interval="30s" >> primitive drbd0 ocf:heartbeat:drbd \ >> params drbd_resource="drbd0" \ >> op monitor interval="29s" role="Master" timeout="30s" \ >> op monitor interval="30s" role="Slave" timeout="30s" \ >> op start interval="0" timeout="30s" start-delay="10s" >> primitive drbd1 ocf:heartbeat:drbd \ >> params drbd_resource="drbd1" \ >> op monitor interval="29s" role="Master" timeout="30s" \ >> op monitor interval="30s" role="Slave" timeout="30s" \ >> op start interval="0" timeout="30s" start-delay="10s" >> primitive drbd2 ocf:heartbeat:drbd \ >> params drbd_resource="drbd2" \ >> op monitor interval="29s" role="Master" timeout="30s" \ >> op monitor interval="30s" role="Slave" timeout="30s" \ >> op start interval="0" timeout="30s" start-delay="10s" >> primitive R_pingd ocf:pacemaker:pingd >> primitive R_Failover_Alert_Init ocf:heartbeat:MailTo2 \ >> params sender="[email protected]" \ >> params email="[email protected],[email protected]" \ >> params subject="ACS Init" >> primitive R_Failover_Alert_Target ocf:heartbeat:MailTo2 \ >> params sender="[email protected]" \ >> params email="[email protected],[email protected]" \ >> params subject="ACS San" >> group G_Target R_IP_Target R_tgtd \ >> meta target-role="Started" >> group G_Init R_IP_Init R_iscsi R_LVM R_Filesystem R_NFS \ >> meta target-role="Stopped" >> ms ms-drbd0 drbd0 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> ms ms-drbd1 drbd1 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> ms ms-drbd2 drbd2 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> clone pingd R_pingd \ >> meta target-role="Started" >> clone Failover_Alert_Init R_Failover_Alert_Init \ >> meta clone-max="2" target-role="Stopped" >> clone Failover_Alert_Target R_Failover_Alert_Target \ >> meta clone-max="2" target-role="Stopped" >> location pingd-node-1 pingd 500: init1.mydomain.com >> location pingd-node-2 pingd 500: init2.mydomain.com >> location pingd-node-3 pingd 500: san1.mydomain.com >> location pingd-node-4 pingd 500: san2.mydomain.com >> location ms-drbd0-pref-1 ms-drbd0 200: san1.mydomain.com >> location ms-drbd0-pref-2 ms-drbd0 100: san2.mydomain.com >> location ms-drbd1-pref-1 ms-drbd1 200: san1.mydomain.com >> location ms-drbd1-pref-2 ms-drbd1 100: san2.mydomain.com >> location ms-drbd2-pref-1 ms-drbd2 200: san1.mydomain.com >> location ms-drbd2-pref-2 ms-drbd2 100: san2.mydomain.com >> location G_Target-pref-1 G_Target 200: san1.mydomain.com >> location G_Target-pref-2 G_Target 100: san2.mydomain.com >> location G_Init-pref-1 G_Init 200: init1.mydomain.com >> location G_Init-pref-2 G_Init 100: init2.mydomain.com >> location Failover-Alert-node1 Failover_Alert_Init 200: init1.mydomain.com >> location Failover-Alert-node2 Failover_Alert_Init 100: init2.mydomain.com >> location Failover-Alert-node3 Failover_Alert_Target 200: >> san1.mydomain.com >> location Failover-Alert-node4 Failover_Alert_Target 100: >> san2.mydomain.com >> colocation G_Target-on-ms-drbd0 inf: G_Target ms-drbd0:Master >> colocation G_Target-on-ms-drbd1 inf: G_Target ms-drbd1:Master >> colocation G_Target-on-ms-drbd2 inf: G_Target ms-drbd2:Master >> order ms-drbd0-before-ms-drbd1 inf: ms-drbd0:promote ms-drbd1:promote >> order ms-drbd1-before-ms-drbd2 inf: ms-drbd1:promote ms-drbd2:promote >> order ms-drbd2-before-G_Target inf: ms-drbd2:promote G_Target:start >> order G_Target-before-G_Init inf: G_Target:start G_Init:start >> property $id="cib-bootstrap-options" \ >> dc-version="1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9" \ >> stonith-enabled="false" \ >> stonith-action="reboot" \ >> stop-orphan-resources="true" \ >> stop-orphan-actions="true" \ >> symmetric-cluster="false" \ >> last-lrm-refresh="1239899583" \ >> default-resource-stickiness="INFINITY" >> >> Any ideas? >> -- >> View this message in context: >> http://www.nabble.com/DRBD-does-not-switch-resources-to-other-node-properly-tp23082432p23082432.html >> Sent from the Linux-HA mailing list archive at Nabble.com. >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > -- View this message in context: http://www.nabble.com/DRBD-does-not-switch-resources-to-other-node-properly-tp23082432p23084716.html Sent from the Linux-HA mailing list archive at Nabble.com. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
