[Linux-HA] DRBD master/slave needs two nodes on startup

Adrian Overbury Thu, 30 Oct 2008 22:05:50 -0700

I've got a 2 node HA cluster running Heartbeat 2.1.3 on Ubuntu Hardy.The cluster runs drbd8 in a master/slave configuration, a filesystem, IPaddress, and a postgresql database server. Everything is set up andworking perfectly except for one thing:

In testing out the different failure scenarios so we can see if we'regetting expected behavior, we found that if we shut both nodes down,then started only one up Heartbeat complained that it couldn't put theother resource clone for drbd anywhere, and because of that it oculdn'tstart any of the other resources and wouldn't go anywhere. Shortlyafterwards, these lines appeared in the log:

heartbeat[4691]: 2008/10/31_15:06:06 info: time_longclock: clock_twrapped around (uptime).ccm[4780]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).mgmtd[4786]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).tengine[4790]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).crmd[4785]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).pengine[4791]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).lrmd[4782]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).heartbeat[4744]: 2008/10/31_15:06:06 info: time_longclock: clock_twrapped around (uptime).attrd[4784]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).stonithd[4783]: 2008/10/31_15:06:06 info: time_longclock: clock_twrapped around (uptime).cib[4781]: 2008/10/31_15:06:06 info: time_longclock: clock_t wrappedaround (uptime).heartbeat[4747]: 2008/10/31_15:06:07 info: time_longclock: clock_twrapped around (uptime).heartbeat[4745]: 2008/10/31_15:06:07 info: time_longclock: clock_twrapped around (uptime).heartbeat[4749]: 2008/10/31_15:06:07 info: time_longclock: clock_twrapped around (uptime).heartbeat[4748]: 2008/10/31_15:06:07 info: time_longclock: clock_twrapped around (uptime).heartbeat[4746]: 2008/10/31_15:06:07 info: time_longclock: clock_twrapped around (uptime).heartbeat[4750]: 2008/10/31_15:06:07 info: time_longclock: clock_twrapped around (uptime).

I found that if I changed the globally_unique attribute for themaster_slave resource to be true, then the problem went away, butintroduced a new one: when the other node was brought up, all resourceswould transition to that, which is not an ideal situation at all,considering that the node might be in the middle of syncing and so wouldfail to start. I'm wondering if this behavior is expected, and ifthere's anything we can do to mitigate it. Below are my config and cib.Thanks for any help you can give


Adrian

--

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
autojoin none
keepalive 2
deadtime 30
warntime 10
initdead 60
udpport 694
bcast   eth1 eth2 eth0  # Linux
auto_failback off
crm on
node db3
node db4

--

<cib admin_epoch="0" have_quorum="true" ignore_dtd="false"num_peers="2" cib_feature_revision="2.0" generated="true" epoch="21"num_updates="113" cib-last-written="Fri Oct 31 15:23:00 2008"ccm_transition="2" dc_uuid="93932938-b211-4ffc-ab4e-9e8193afddaf">

   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>

<nvpair id="cib-bootstrap-options-dc-version"name="dc-version" value="2.1.3-node:552305612591183b1628baa5bc6e903e0f1e26a3"/><nvpair id="crm_config_1" name="default-resource-stickiness"value="INFINITY"/><nvpair id="crm_config_2"name="default-resource-failure-stickiness" value="-INFINITY"/><nvpair id="crm_config_3" name="no-quorum-policy"value="ignore"/><nvpair id="crm-config_4" name="symmetric-cluster"value="true"/>

         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>

<node id="975c9f86-1dac-4ca2-924d-5f2196672367" uname="db4"type="normal"/><node id="93932938-b211-4ffc-ab4e-9e8193afddaf" uname="db3"type="normal"/>

     </nodes>
     <resources>
       <primitive class="ocf" provider="heartbeat" type="pgsql" id="pgsql">
         <operations>
           <op id="pgsql-mon" interval="5s" timeout="15s" name="monitor"/>
         </operations>
         <instance_attributes id="pgsql-ia">
           <attributes>

<nvpair name="pgctl"value="/usr/lib/postgresql/8.3/bin/pg_ctl" id="pgsql-ia-1"/><nvpair name="pgdata" value="/var/lib/postgresql/8.3/main"id="pgsql-ia-2"/>

             <nvpair name="psql" value="/usr/bin/psql" id="pgsql-ia-3"/>

<nvpair id="pgsql-ia-4" name="logfile"value="/var/log/postgresql/postgresql-8.3-main.log"/>

             <nvpair id="pgsql-ia-5" name="pgdba" value="postgres"/>
           </attributes>
         </instance_attributes>
       </primitive>
       <master_slave id="ms-drbd0">
         <meta_attributes id="ms-drbd0-ia">
           <attributes>
             <nvpair id="ms-drbd0-ia-1" name="clone_max" value="2"/>
             <nvpair id="ms-drbd0-ia-2" name="clone_node_max" value="1"/>
             <nvpair id="ms-drbd0-ia-3" name="master_max" value="1"/>
             <nvpair id="ms-drbd0-ia-4" name="master_node_max" value="1"/>

<nvpair id="ms-drbd0-ia-5" name="globally_unique"value="true"/>

             <nvpair id="ms-drbd0-ia-6" name="notify" value="yes"/>
           </attributes>
         </meta_attributes>

<primitive class="ocf" id="drbd_r0" provider="heartbeat"type="drbd">

           <operations>

<op id="drbd_r0_mon" interval="5s" name="monitor"timeout="15s"/>

           </operations>
           <instance_attributes id="drbd_r0_ia">
             <attributes>
               <nvpair id="drbd_r0_ia_1" name="drbd_resource" value="r0"/>
             </attributes>
           </instance_attributes>
         </primitive>
       </master_slave>

<primitive class="ocf" id="IPaddr_10_0_0_252"provider="heartbeat" type="IPaddr">

         <operations>

<op id="IPaddr_10_0_0_252_mon" interval="5s" name="monitor"timeout="15s"/>

         </operations>
         <instance_attributes id="IPaddr_10_0_0_252_inst_attr">
           <attributes>

<nvpair id="IPaddr_10_0_0_252_attr_0" name="ip"value="10.0.0.252"/><nvpair id="IPaddr_10_0_0_252_attr_1" name="netmask"value="24"/><nvpair id="IPaddr_10_0_0_252_attr_2" name="nic"value="eth0"/>

           </attributes>
         </instance_attributes>
       </primitive>

<primitive class="ocf" id="fs_postgres" provider="heartbeat"type="Filesystem">

         <operations>

<op id="fs_postgres-ops-1" interval="5s" name="monitor"timeout="15s"/>

         </operations>
         <instance_attributes id="fs_postgres-ia">
           <attributes>

<nvpair id="fs_postgres-ia-1" name="device"value="/dev/drbd0"/><nvpair id="fs_postgres-ia-2" name="directory"value="/var/lib/postgresql"/>

             <nvpair id="fs_postgres-ia-3" name="fstype" value="ext3"/>

<nvpair id="fs_postgres-ia-4" name="options"value="defaults,noauto"/>

           </attributes>
         </instance_attributes>
       </primitive>
     </resources>
     <constraints>

<rsc_order id="fs-after-drbd" from="fs_postgres" action="start"to="ms-drbd0" to_action="promote"/><rsc_order id="ip-after-fs" from="IPaddr_10_0_0_252"action="start" to="fs_postgres"/><rsc_order id="pgsql-after-ip" from="pgsql" action="start"to="IPaddr_10_0_0_252"/><rsc_colocation id="ip-with-drbd" from="IPaddr_10_0_0_252"to="ms-drbd0" to_role="master" score="INFINITY"/><rsc_colocation id="fs-with-drbd" from="fs_postgres"to="ms-drbd0" to_role="master" score="INFINITY"/><rsc_colocation id="pgsql-with-drbd" from="pgsql" to="ms-drbd0"to_role="master" score="INFINITY"/>

     </constraints>
   </configuration>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] DRBD master/slave needs two nodes on startup

Reply via email to