Hi, On Tue, Jan 29, 2008 at 01:25:57PM -0700, Daniel Stickney wrote: > Hello everyone, > > Our setup: CentOS 5 (kernel 2.6.18-53), Heartbeat > heartbeat-2.1.2-3.el5.centos, DRBD drbd-8.0.6-1.el5.centos > > We are running into a problem with getting the master DRBD resource to > stick on a node it has failed onto. We have a simple 2 node cluster for > demonstration of the issue, halinux1 and halinux2, with a single DRBD > resource. What we are seeing is halinux2 selected as the Master node for > DRBD on heartbeat startup, halinux1 as the slave. When halinux2 is placed > into standby, the halinux1 is promoted to DRBD master as expected. When > halinux2 is taken out of standby mode, halinux1 is demoted to secondary and > halinux2 is promoted to master. We don't want this failback action. We want > the DRBD master to stay on whatever node it is on unless there is a failure > requiring it to move. We have default-resource-stickiness set to "infinity" > in our cib.xml file. I repeated this experiment with a single IP address > resource (no DRBD), and the stickiness of infinity worked exactly as > expected: the IP stayed on whatever node it was on unless there was a > failure (or standby mode) on the local node requiring the IP to move, so > that was a positive confirmation that outside of our testing with DRBD, the > stickiness of infinity works. We would very much appreciate suggestions on > how we might go about resolving this issue.
The multistate resources should have been much improved in version 2.1.3. Johnny Hughes, the CentOS heartbeat maintainer, has 2.1.3 available and is looking for testers: http://marc.info/?l=linux-ha&m=120110530418348&w=2 Thanks, Dejan > Here is the cib.xml file: > ---------------------------------- > <cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false" > num_peers="2" cib_feature_revision="1.3" epoch="35" num_updates="1" > cib-last-wr > itten="Tue Jan 29 12:36:17 2008" ccm_transition="2" > dc_uuid="d2c440e4-9668-4a70-b7e2-de7f52834325"> > <configuration> > <crm_config> > <cluster_property_set id="cluster_defaults"> > <attributes> > <nvpair name="default-resource-stickiness" id="stickiness" > value="INFINITY"/> > </attributes> > </cluster_property_set> > </crm_config> > <nodes> > <node uname="halinux2" type="normal" > id="216a5f87-c472-4ce6-a3f1-7ce4f6dc1bae"> > <instance_attributes > id="nodes-216a5f87-c472-4ce6-a3f1-7ce4f6dc1bae"> > <attributes> > <nvpair name="standby" > id="standby-216a5f87-c472-4ce6-a3f1-7ce4f6dc1bae" value="false"/> > </attributes> > </instance_attributes> > </node> > <node uname="halinux1" type="normal" > id="d2c440e4-9668-4a70-b7e2-de7f52834325"> > <instance_attributes > id="nodes-d2c440e4-9668-4a70-b7e2-de7f52834325"> > <attributes> > <nvpair name="standby" > id="standby-d2c440e4-9668-4a70-b7e2-de7f52834325" value="false"/> > </attributes> > </instance_attributes> > </node> > </nodes> > <resources> > <master_slave id="ms-drbd0"> > <meta_attributes id="ma-ms-drbd0"> > <attributes> > <nvpair id="ma-ms-drbd0-1" name="clone_max" value="2"/> > <nvpair id="ma-ms-drbd0-2" name="clone_node_max" value="1"/> > <nvpair id="ma-ms-drbd0-3" name="master_max" value="1"/> > <nvpair id="ma-ms-drbd0-4" name="master_node_max" value="1"/> > <nvpair id="ma-ms-drbd0-5" name="notify" value="yes"/> > <nvpair id="ma-ms-drbd0-6" name="globally_unique" > value="false"/> > <nvpair id="ma-ms-drbd0-7" name="target_role" value="started"/> > </attributes> > </meta_attributes> > <primitive id="DRBD" class="ocf" provider="heartbeat" type="drbd"> > <instance_attributes id="ia-DRBD"> > <attributes> > <nvpair id="ia-DRBD-1" name="drbd_resource" value="mysql"/> > </attributes> > </instance_attributes> > </primitive> > </master_slave> > </resources> > <constraints/> > </configuration> > </cib> > ---------------------------------- > ========================================================================= > > Here is our ha.cf file: > ---------------------------------- > use_logd yes > udpport 695 > bcast eth0 > node halinux1 > node halinux2 > crm on > ---------------------------------- > ========================================================================= > > Here is a link to the /var/log/messages output on halinux1 starting from > the time when halinux2 comes out of standby mode and the unwanted failback > occurs: http://pastebin.com/m6e55f6b3 > > Thank you in advance for your time, > -Daniel > > -- > Daniel Stickney - Linux Systems Administrator > Email: [EMAIL PROTECTED] > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
