Hi,

I have a two node cluster configured with a group of IPAddr2 resources, 4 ip addresses each on a separate interface. Each resource successfully starts and if the heartbeat service fails or the box fails they transition across to the other node. If I manually take down an interface using ipdown <interface>, then heartbeat recognises the interface is down and restarts it.

The only issue I have is when the ethernet cable is removed, heartbeat just doesn't notice, leaving the resources running on the main node.

In order to overcome this situation I tried to configure pingd, extract from cib.xml below:-

<primitive id="pingd:connected" class="ocf" type="pingd" provider="heartbeat">
          <instance_attributes id="pingd:connected_instance_attrs">
            <attributes>
<nvpair id="15c8d68d-9729-4db9-b92e-141d30e8eac3" name="pidfile" value="/tmp/ha_pingd_pid"/> <nvpair id="6b01b3be-c298-4f2e-8d08-e22084f5c5ca" name="host_list" value="carbon dubnium sydsw1"/> <nvpair id="979fb490-8899-4368-a33a-d06c1ae8dadb" name="name" value="pingd:connected:id"/> <nvpair id="8cd4aff4-117b-4e33-ad4c-fe3cd220255b" name="multiplier" value="100"/>
            </attributes>
          </instance_attributes>
        </primitive>

      <rsc_location id="group_1:connected" rsc="group_1">
<rule id="group_1:connected:rule" score_attribute="pingd:connected"> <expression id="group_1:connected:expr:defined" attribute="pingd:connected" operation="defined"/>
        </rule>
      </rsc_location>

This is just as happy with the situation as before, even though the node with the failed network connection in no way can ping those hosts.

In the log from the first node:-
Oct 2 15:57:06 sydgw1 lrmd: [32694]: info: RA output: (pingd:connected:start:stdout) Adding ping host carbonAdding ping host dubniumAdding ping host sydsw1 Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM operation pingd:connected_start_0 (call=16, rc=0) complete Oct 2 15:57:06 sydgw1 crmd: [32697]: info: build_operation_update: Digest for 0:0;13:2:d1e63583-0eba-4a44-8b53-b10ed4aa449e (pingd:connected_start_0) was 30362598aa31f8e8d68c0c9870c6703c Oct 2 15:57:06 sydgw1 crmd: [32697]: info: log_data_element: build_operation_update: digest:source <parameters multiplier="100" name="pingd:connected:id" host_list="carbon dubnium sydsw1" pidfile="/tmp/ha_pingd_pid"/> Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM operation IPaddr2_4_monitor_5000 (call=15, rc=0) complete Oct 2 15:57:11 sydgw1 pingd: [643]: info: do_node_walk: Requesting the list of configured nodes Oct 2 15:57:11 sydgw1 attrd: [32696]: info: find_hash_entry: Creating hash entry for pingd:connected:id
Oct  2 15:57:11 sydgw1 pingd: [643]: info: send_update: 0 active ping nodes
Oct  2 15:57:11 sydgw1 pingd: [643]: info: main: Starting pingd
Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_trigger_update: Sending flush op to all hosts for: pingd:connected:id Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_ha_callback: flush message from sydgw1.zomojo.com Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_perform_update: Sent update 3: pingd:connected:id=0


What have I missed ?

Thanks for your help

Phil.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to