[Linux-HA] Failover problem

Divan Booyens Tue, 05 Jun 2007 03:29:05 -0700

Hi,

I have 3 NAT servers, and "Heartbeat 2.0.8" has been installed on all 3 for
High Availability.
The nodes have each 2 network interfaces that are connect to seperate
networks, as you can see in the picture.
The IP's and node names that I am using are examples.


I have added my "ha.cf" and "cib.xml" files below the picture. This is only
for HANODE1.

The only thing that change on HANODE2 and HANODE3 is the "ucast" ip's in the
"ha.cf" file.
The " ucast" ip's on HANODE2 and HANODE3 will be configured to the other
nodes.

IP 72.10.0.10 will be the virtual IP for the interfaces connected on the
72.10.0.0 network.
IP 57.20.1.10 will be the virtual IP for the interfaces connected on the
57.20.1.0 network.

So if on of the interfaces goes down on the 1st node, the virtual IP's must
failover to the 2nd node.
If one of the interfaces on the 2nd node goes down, the virtual IP's must
failover to the 3rd node.

If all of the interfaces comes back up, the virtual IP's must failback to
the 1st node.


My problem at the moment is that the failover and failback is not working.

If I take out one of the network cables on HANODE1, the virtual IPs does not
failover to the 2nd or 3rd node.
If I take out both of the network cables on HANODE1, the virtual IP
72.10.0.10 failover to HANODE3 and the
virtual IP 57.20.1.10 failover to HANODE2. I want both virtual IPs to
failover to HANODE2. If I then remove
both network cables from HANODE2, the virtual IPs do not failover to
HANODE3, and I can not ping the virtual
IPs. But if I plug all of the network cables back in, the virtual IPs
failback to HANODE1. WEIRD!!!!!

Can Anyone PLEASE HELP!!!!!




                       72.10.0.10

_______________________________________________________________________________________________

                                                |
                  |                                              |
                                                |ETH0
               |ETH0                                      |ETH0
                                                |72.10.0.2
             |72.10.0.3                                 |72.10.0.4
                                         ____|_______
  _____|______                            _____|_________
                                         |                  |
          |                  |                            |
      |
                                         |                  |
          |                  |                            |
      |
                                         | HANODE1  |
  |  HANODE2 |                            |   HANODE3      |
                                         |                  |
         |                  |                            |
     |
                                         |                  |
         |                  |                            |
      |
                                         |                  |
         |                  |                            |
      |
                                         |__________|
    |__________|                            |______________|
                                                  |
                  |                                                 |
                                                  | ETH1
               | ETH1                                        |ETH1
                                                  | 57.20.1.2
                | 57.20.1.3                                  |57.20.1.4
           ______________________|__________________
_______|____________________________|_______________
                       57.20.1.10





ha.cf on HANODE1:

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 10
initdead 120
udpport 694
ucast eth0 72.10.0.3
ucast eth0 72.10.0.4
ucast eth1 57.20.1.3
ucast eth1 57.20.1.4
auto_failback on
node    HANODE1
node    HANODE2
node    HANODE3
ping 72.10.0.1 57.20.1.1     #(This is the defaulf gateway of each network.)
respawn hacluster /usr/lib/heartbeat/pingd -m 100 -d 5s
crm yes





cib.xml:


<cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="3"
ccm_transition="3" cib_feature_revision="1.3" generated="true"
dc_uuid="d48a7c89-8d7d-4b3f-9c4e-6309e712ecc0" epoch="4" num_updates="52"
cib-last-written="Wed May 30 17:14:17 2007">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <attributes>
          <nvpair id="cib-bootstrap-options-symmetric-cluster"
name="symmetric-cluster" value="true"/>
          <nvpair id="cib-bootstrap-options-no_quorum-policy"
name="no_quorum-policy" value="stop"/>
          <nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="0"/>
          <nvpair
id="cib-bootstrap-options-default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="0"/>
          <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
          <nvpair id="cib-bootstrap-options-stonith-action"
name="stonith-action" value="reboot"/>
          <nvpair id="cib-bootstrap-options-stop-orphan-resources"
name="stop-orphan-resources" value="true"/>
          <nvpair id="cib-bootstrap-options-stop-orphan-actions"
name="stop-orphan-actions" value="true"/>
          <nvpair id="cib-bootstrap-options-remove-after-stop"
name="remove-after-stop" value="false"/>
          <nvpair id="cib-bootstrap-options-short-resource-names"
name="short-resource-names" value="true"/>
          <nvpair id="cib-bootstrap-options-transition-idle-timeout"
name="transition-idle-timeout" value="5min"/>
          <nvpair id="cib-bootstrap-options-default-action-timeout"
name="default-action-timeout" value="5s"/>
          <nvpair id="cib-bootstrap-options-is-managed-default"
name="is-managed-default" value="true"/>
        </attributes>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="d48a7c89-8d7d-4b3f-9c4e-6309e712ecc0" uname="HANODE3"
type="normal"/>
      <node id="fb8809d5-805e-42a8-a0dd-6e80f64ebce7" uname="HANODE2"
type="normal"/>
      <node id="ef2928d0-998d-4e16-9785-07753792cc64" uname="HANODE1"
type="normal"/>
    </nodes>
    <resources>
      <primitive class="ocf" id="IPaddr_72_10_0_10" provider="heartbeat"
type="IPaddr">
        <operations>
          <op id="IPaddr_72_10_0_10_mon" interval="5s" name="monitor"
timeout="5s"/>
        </operations>
        <instance_attributes id="IPaddr_72_10_0_10_inst_attr">
          <attributes>
            <nvpair id="IPaddr_72.10.0.10_attr_0" name="ip" value="
72.10.0.10"/>
          </attributes>
        </instance_attributes>
      </primitive>
      <primitive class="ocf" id="IPaddr_57_20_1_10" provider="heartbeat"
type="IPaddr">
        <operations>
          <op id="IPaddr_57_20_1_10_mon" interval="5s" name="monitor"
timeout="5s"/>
        </operations>
        <instance_attributes id="IPaddr_57_20_1_10_inst_attr">
          <attributes>
            <nvpair id="IPaddr_57_20_1_10_attr_0" name="ip" value="
57.20.1.10"/>
          </attributes>
        </instance_attributes>
      </primitive>
    </resources>
    <constraints>
      <rsc_location id="rsc_location_IPaddr_72_10_0_10"
rsc="IPaddr_72_10_0_10">
        <rule id="prefered_location_IPaddr_72_10_0_10" score="100">
          <expression attribute="#uname"
id="prefered_location_IPaddr_72_10_0_10_expr" operation="eq"
value="HANODE1"/>
        </rule>
      </rsc_location>
      <rsc_location id="rsc_location_IPaddr_57_20_1_10"
rsc="IPaddr_57_20_1_10">
        <rule id="prefered_location_IPaddr_57_20_1_10" score="100">
          <expression attribute="#uname"
id="prefered_location_IPaddr_57_20_1_10_expr" operation="eq"
value="HANODE1"/>
        </rule>
      </rsc_location>
    </constraints>
  </configuration>
</cib>





Kind regards

Divan Booyens
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Failover problem

Reply via email to