[Linux-HA] pingd and resources

Phil Manuel Mon, 01 Oct 2007 23:18:20 -0700

Hi,

I have a two node cluster configured with a group of IPAddr2 resources,4 ip addresses each on a separate interface. Each resource successfullystarts and if the heartbeat service fails or the box fails theytransition across to the other node. If I manually take down aninterface using ipdown <interface>, then heartbeat recognises theinterface is down and restarts it.

The only issue I have is when the ethernet cable is removed, heartbeatjust doesn't notice, leaving the resources running on the main node.

In order to overcome this situation I tried to configure pingd, extractfrom cib.xml below:-

<primitive id="pingd:connected" class="ocf" type="pingd"provider="heartbeat">

          <instance_attributes id="pingd:connected_instance_attrs">
            <attributes>

<nvpair id="15c8d68d-9729-4db9-b92e-141d30e8eac3"name="pidfile" value="/tmp/ha_pingd_pid"/><nvpair id="6b01b3be-c298-4f2e-8d08-e22084f5c5ca"name="host_list" value="carbon dubnium sydsw1"/><nvpair id="979fb490-8899-4368-a33a-d06c1ae8dadb"name="name" value="pingd:connected:id"/><nvpair id="8cd4aff4-117b-4e33-ad4c-fe3cd220255b"name="multiplier" value="100"/>

            </attributes>
          </instance_attributes>
        </primitive>

      <rsc_location id="group_1:connected" rsc="group_1">

<rule id="group_1:connected:rule"score_attribute="pingd:connected"><expression id="group_1:connected:expr:defined"attribute="pingd:connected" operation="defined"/>

        </rule>
      </rsc_location>

This is just as happy with the situation as before, even though the nodewith the failed network connection in no way can ping those hosts.


In the log from the first node:-

Oct 2 15:57:06 sydgw1 lrmd: [32694]: info: RA output:(pingd:connected:start:stdout) Adding ping host carbonAdding ping hostdubniumAdding ping host sydsw1Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRMoperation pingd:connected_start_0 (call=16, rc=0) completeOct 2 15:57:06 sydgw1 crmd: [32697]: info: build_operation_update:Digest for 0:0;13:2:d1e63583-0eba-4a44-8b53-b10ed4aa449e(pingd:connected_start_0) was 30362598aa31f8e8d68c0c9870c6703cOct 2 15:57:06 sydgw1 crmd: [32697]: info: log_data_element:build_operation_update: digest:source <parameters multiplier="100"name="pingd:connected:id" host_list="carbon dubnium sydsw1"pidfile="/tmp/ha_pingd_pid"/>Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRMoperation IPaddr2_4_monitor_5000 (call=15, rc=0) completeOct 2 15:57:11 sydgw1 pingd: [643]: info: do_node_walk: Requesting thelist of configured nodesOct 2 15:57:11 sydgw1 attrd: [32696]: info: find_hash_entry: Creatinghash entry for pingd:connected:id

Oct  2 15:57:11 sydgw1 pingd: [643]: info: send_update: 0 active ping nodes
Oct  2 15:57:11 sydgw1 pingd: [643]: info: main: Starting pingd

Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_trigger_update:Sending flush op to all hosts for: pingd:connected:idOct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_ha_callback: flushmessage from sydgw1.zomojo.comOct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_perform_update: Sentupdate 3: pingd:connected:id=0



What have I missed ?

Thanks for your help

Phil.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] pingd and resources

Reply via email to