Re: [Linux-HA] [ HELP ] pingd not failover (Active/Standy)

chiu chun chir Mon, 30 Apr 2007 20:21:17 -0700

I'm sorry that I don't explain my HA configuration very well.
First of all, I want to explain why I change the system's network
configuration.
Because the host is not nearby me , it is pilot running in client site I
cannot unplug the network cable by myself.
So I have to simulate the network to the outside world had been broken.
That's why I change the network setting to an error IP which cannot reach my
PingNode-Default Gateway.


There are two nodes in my cluster environment.
Host tacomcs1 will be the active node.
Host tacomcs2 will be the standby(passive) node.

Major business resource were defined as the group TACO_SERVICES.
It was consist of IPaddr2, tomcat, and a Java RMI Server.

      <group id="TACO_SERVICES">
        <primitive id="VIP" class="ocf" type="IPaddr2"
provider="heartbeat">
          <instance_attributes id="VIP_instance_attrs">
            <attributes>
              <nvpair id="VIP_target_role" name="target_role"
value="started"/>
              <nvpair id="978e63ce-c11b-473c-9cd4-22f109e25a98" name="ip"
value="10.31.70.7"/>
            </attributes>
          </instance_attributes>
        </primitive>
        <primitive class="lsb" type="tacormi" provider="heartbeat"
id="TACORMI">
          <instance_attributes id="TACORMI_instance_attrs">
            <attributes>
              <nvpair name="target_role" id="TACORMI_target_role"
value="started"/>
            </attributes>
          </instance_attributes>
          <operations>
            <op id="02df13ab-ea17-4142-bd5e-3f1e815c3178" name="monitor"
interval="10s" timeout="60s"/>
          </operations>
        </primitive>
        <primitive class="lsb" type="tomcat" provider="heartbeat"
id="TOMCAT">
          <instance_attributes id="TOMCAT_instance_attrs">
            <attributes>
              <nvpair name="target_role" id="TOMCAT_target_role"
value="started"/>
            </attributes>
          </instance_attributes>
          <operations>
            <op id="a4d6e1e9-c733-45ab-a407-0713d4e3be19" name="monitor"
interval="10s" timeout="60s"/>
          </operations>
        </primitive>
        <instance_attributes id="TACO_SERVICES_instance_attrs">
          <attributes>
            <nvpair id="TACO_SERVICES_target_role" name="target_role"
value="started"/>
          </attributes>
        </instance_attributes>
      </group>

My excepted scenario will be the following:
If I unplug the tacomcs1's network cable, TACO_SERVICES will be failover to
tacomcs2.
Once I plug the tacomcs1's netowrk cable back, it will be a standby node.
And TACO_SERVICES should stay at tacomcs2 ,
until I unplug it's network cable or shut it down,
then TACO_SERVICES will be failover to tacomcs1.
Simply Active / Passive configuration.

And set "auto_failback off" in my ha.cf due to Active/Passive scenario.

# I hope services will remain on a node until the node cannot reach the
outside world or shutdown.
# cib.xml
<nvpair name="default_resource_stickiness" id="cibbootstrap-02"
value="INFINITY"/>

# I hope if the active node were fail, failover all resource to standby
node.
# cib.xml
<nvpair id="cibbootstrap-03" name="default_resource_failure_stickiness"
value="-INFINITY"/>


And a pingd cloneset running on both node.
With dampen (5s) multiplier(100).
I thought it will add the score 100 per 5 secs to each node, am I right ?
# cib.xml
      <clone id="IP_MON_CLONESET">
        <instance_attributes id="IP_MON_CLONESET_instance_attrs">
          <attributes>
            <nvpair id="IP_MON_CLONESET_clone_max" name="clone_max"
value="2"/>
            <nvpair name="clone_node_max"
id="IP_MON_CLONESET_clone_node_max" value="1"/>
          </attributes>
        </instance_attributes>
        <primitive id="DEFAULT_GATEWAY_MONITOR" class="ocf" type="pingd"
provider="heartbeat">
          <instance_attributes id="DEFAULT_GATEWAY_MONITOR_instance_attrs">
            <attributes>
              <nvpair id="DEFAULT_GATEWAY_MONITOR_target_role"
name="target_role" value="started"/>
              <nvpair id="50431d4a-9d21-4cc6-bc32-8570638bdf8f"
name="dampen" value="5s"/>
              <nvpair id="bdf08a1f-4f66-4ac1-a56d-7dfb482ae984"
name="multiplier" value="100"/>
            </attributes>
          </instance_attributes>
          <operations>
            <op id="6c30c8f0-e497-4a04-91a7-2d174c27d4d2" name="monitor"
interval="5s" timeout="10s"/>
          </operations>
        </primitive>
      </clone>


Finally, a resource location constraint.
#cib.xml
      <rsc_location id="my_resource:connected" rsc="TACO_SERVICES">
        <rule id="my_resource:connected:rule" score_attribute="pingd">
          <expression id="my_resource:connected:expr:gateway"
attribute="pingd" operation="defined" value="10.31.70.1"/>
        </rule>
      </rsc_location>

10.31.70.1 is my default gateway's IP Address.
I thought the following is that if pingd can reach the gateway, and then it
will add score 100 to current machine.
So in the first time,
Score of tacomcs1 will be 100.
Score of tacomcs2 will be 100.

After 5s (dampen),
Score of tacomcs1 will be 200.
Score of tacomcs2 will be 200.

After I unplug the tacomcs1 network cable and a 5s later,
Score of tacomcs1 will be 200.
Score of tacomcs2 will be 300.

And resource group -- TACO_SERVICES should failover to tacomcs2.

Am I wrong with my assumption?
Please advise me if I made it wrong.




2007/4/30, Alan Robertson <[EMAIL PROTECTED]>:


chiu chun chir wrote:
> Dear Masters,
>
> Sorry forgot attachment last letter...
>
>
> I've set up cluster with 2 nodes ( tacomcs1-Active and tacomcs2-Standby)
.
>
> OS is SUSE Eneterprise Server 10 with heartbeat-2.0.7-1.2 upgraded.
>
> I wish if tacomcs1 cannot reach the outside world network, all services
can
> failover to tacomcs2.
>
> And all services will stay in tacomcs2 until it was failed or rebooted.
>
>
>
> I've follow the resource constraint illustrated at
> http://www.linux-ha.org/pingd.
>
> Topic -> Quickstart - Run my resource on the node with the best
> connectivity.
>
>
>
> And make a similar setting according the guide :
>
> <rsc_location id="my_resource:connected" rsc="my_resource">
>
>  <rule id="my_resource:connected:rule" score_attribute="pingd" >
>
>    <expression id="my_resource:connected:expr:defined"
>
>      attribute="pingd" operation="defined"/>
>
>  </rule>
>
> </rsc_location>
>
>
>
> After I use `yast` to modify tacomcs1 IP (Active node)
>
> (an IP address which cannot reach the gateway address : 10.31.70.1 -
> configured in ha.cf as a PingNode).
>
>
>
> In the first time, the group named 'TACO_SERVICES' had failover to
tacomcs2
> (Standby node).
>
> After it failover, I'd modify tacomcs1 to correct IP Address -
10.31.70.8 -
> can reach gateway.
>
> But in the mean time, tacomcs1 become OFFLINE (showed by $ crm_mon -1).
>
>
>
> But after I restarted the both tacomcs1 and tacomcs2 HEARTBEAT service.
>
> And use `yast` to modify tacomcs1 (Active node) IP address to wrong IP
> (can't reach gateway).
>
> It does not failover to tacomcs2 anymore.
>
> In contrarily, tacomcs1 believes tacomcs2 was OFFLINE via $ crm_mon -1.

Don't change your running system's network configuration.

But, I don't quite understand what it is you did, or why you did it.

--
   Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] [ HELP ] pingd not failover (Active/Standy)

Reply via email to