[Linux-HA] [ HELP ] pingd not failover (Active/Standy)

chiu chun chir Mon, 30 Apr 2007 03:22:26 -0700

Dear Masters,

Sorry forgot attachment last letter...



I've set up cluster with 2 nodes ( tacomcs1-Active and tacomcs2-Standby) .

OS is SUSE Eneterprise Server 10 with heartbeat-2.0.7-1.2 upgraded.

I wish if tacomcs1 cannot reach the outside world network, all services can
failover to tacomcs2.

And all services will stay in tacomcs2 until it was failed or rebooted.



I've follow the resource constraint illustrated at
http://www.linux-ha.org/pingd.

Topic -> Quickstart - Run my resource on the node with the best
connectivity.



And make a similar setting according the guide :

<rsc_location id="my_resource:connected" rsc="my_resource">

 <rule id="my_resource:connected:rule" score_attribute="pingd" >

   <expression id="my_resource:connected:expr:defined"

     attribute="pingd" operation="defined"/>

 </rule>

</rsc_location>



After I use `yast` to modify tacomcs1 IP (Active node)

(an IP address which cannot reach the gateway address : 10.31.70.1 -
configured in ha.cf as a PingNode).



In the first time, the group named 'TACO_SERVICES' had failover to tacomcs2
(Standby node).

After it failover, I'd modify tacomcs1 to correct IP Address - 10.31.70.8 -
can reach gateway.

But in the mean time, tacomcs1 become OFFLINE (showed by $ crm_mon -1).



But after I restarted the both tacomcs1 and tacomcs2 HEARTBEAT service.

And use `yast` to modify tacomcs1 (Active node) IP address to wrong IP
(can't reach gateway).

It does not failover to tacomcs2 anymore.

In contrarily, tacomcs1 believes tacomcs2 was OFFLINE via $ crm_mon -1.



I'm confused and cannot figure out what's going wrong.

I've attached my settings, would u please help to verify it?

Is there something wrong with ha.cf or cib.xml ?

autojoin any
crm true
bcast eth1
node tacomcs2
node tacoMCS1
#respawn root /sbin/evmsd
apiauth evms,pingd uid=hacluster,root

#Stick on this machine
auto_failback off

#LAN_FAIL_MONITOR
ping 10.31.70.1
keepalive 2 # 2 seconds

#loggins
use_logd on

 <cib admin_epoch="0" have_quorum="true" num_peers="2" cib_feature_revision="1.3" generated="true" ccm_transition="4" dc_uuid="bd265fc7-06f3-4955-9340-2b75886d103b" epoch="48" num_updates="1618" cib-last-written="Mon Apr 30 15:26:00 2007">
   <configuration>
     <crm_config>
       <cluster_property_set id="cibbootstrap">
         <attributes>
           <nvpair id="cibbootstrap-01" name="transition_idle_timeout" value="60s"/>
           <nvpair id="cibbootstrap-13" name="default_action_timeout" value="5s"/>
           <nvpair name="default_resource_stickiness" id="cibbootstrap-02" value="INFINITY"/>
           <nvpair id="cibbootstrap-03" name="default_resource_failure_stickiness" value="-INFINITY"/>
           <nvpair id="cibbootstrap-04" name="stonith_enabled" value="false"/>
           <nvpair id="cibbootstrap-05" name="stonith_action" value="reboot"/>
           <nvpair id="cibbootstrap-06" name="symmetric_cluster" value="true"/>
           <nvpair id="cibbootstrap-07" name="no_quorum_policy" value="stop"/>
           <nvpair id="cibbootstrap-08" name="stop_orphan_resources" value="true"/>
           <nvpair id="cibbootstrap-09" name="stop_orphan_actions" value="true"/>
           <nvpair id="cibbootstrap-10" name="is_managed_default" value="true"/>
           <nvpair id="cibbootstrap-11" name="remove_after_stop" value="false"/>
           <nvpair id="cibbootstrap-12" name="short_resource_names" value="true"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="a44cc92c-352f-445d-ba2f-122487806ab0" uname="tacomcs1" type="normal"/>
       <node id="bd265fc7-06f3-4955-9340-2b75886d103b" uname="tacomcs2" type="normal"/>
     </nodes>
     <resources>
       <clone id="IP_MON_CLONESET">
         <instance_attributes id="IP_MON_CLONESET_instance_attrs">
           <attributes>
             <nvpair id="IP_MON_CLONESET_clone_max" name="clone_max" value="2"/>
             <nvpair name="clone_node_max" id="IP_MON_CLONESET_clone_node_max" value="1"/>
           </attributes>
         </instance_attributes>
         <primitive id="DEFAULT_GATEWAY_MONITOR" class="ocf" type="pingd" provider="heartbeat">
           <instance_attributes id="DEFAULT_GATEWAY_MONITOR_instance_attrs">
             <attributes>
               <nvpair id="DEFAULT_GATEWAY_MONITOR_target_role" name="target_role" value="started"/>
               <nvpair id="50431d4a-9d21-4cc6-bc32-8570638bdf8f" name="dampen" value="5s"/>
               <nvpair id="bdf08a1f-4f66-4ac1-a56d-7dfb482ae984" name="multiplier" value="100"/>
             </attributes>
           </instance_attributes>
           <operations>
             <op id="6c30c8f0-e497-4a04-91a7-2d174c27d4d2" name="monitor" interval="5s" timeout="10s"/>
           </operations>
         </primitive>
       </clone>
       <group id="TACO_SERVICES">
         <primitive id="VIP" class="ocf" type="IPaddr2" provider="heartbeat">
           <instance_attributes id="VIP_instance_attrs">
             <attributes>
               <nvpair id="VIP_target_role" name="target_role" value="started"/>
               <nvpair id="978e63ce-c11b-473c-9cd4-22f109e25a98" name="ip" value="10.31.70.7"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="lsb" type="tacormi" provider="heartbeat" id="TACORMI">
           <instance_attributes id="TACORMI_instance_attrs">
             <attributes>
               <nvpair name="target_role" id="TACORMI_target_role" value="started"/>
             </attributes>
           </instance_attributes>
           <operations>
             <op id="02df13ab-ea17-4142-bd5e-3f1e815c3178" name="monitor" interval="10s" timeout="60s"/>
           </operations>
         </primitive>
         <primitive class="lsb" type="tomcat" provider="heartbeat" id="TOMCAT">
           <instance_attributes id="TOMCAT_instance_attrs">
             <attributes>
               <nvpair name="target_role" id="TOMCAT_target_role" value="started"/>
             </attributes>
           </instance_attributes>
           <operations>
             <op id="a4d6e1e9-c733-45ab-a407-0713d4e3be19" name="monitor" interval="10s" timeout="60s"/>
           </operations>
         </primitive>
         <instance_attributes id="TACO_SERVICES_instance_attrs">
           <attributes>
             <nvpair id="TACO_SERVICES_target_role" name="target_role" value="started"/>
           </attributes>
         </instance_attributes>
       </group>
     </resources>
     <constraints>
       <rsc_location id="my_resource:connected" rsc="TACO_SERVICES">
         <rule id="my_resource:connected:rule" score_attribute="pingd">
           <expression id="my_resource:connected:expr:gateway" attribute="pingd" operation="defined" value="10.31.70.1"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] [ HELP ] pingd not failover (Active/Standy)

Reply via email to