Hi, On Tue, Oct 02, 2007 at 04:00:23PM +1000, Phil Manuel wrote: > Hi, > > I have a two node cluster configured with a group of IPAddr2 resources, > 4 ip addresses each on a separate interface. Each resource successfully > starts and if the heartbeat service fails or the box fails they > transition across to the other node. If I manually take down an > interface using ipdown <interface>, then heartbeat recognises the > interface is down and restarts it. > > The only issue I have is when the ethernet cable is removed, heartbeat > just doesn't notice, leaving the resources running on the main node. > > In order to overcome this situation I tried to configure pingd, extract > from cib.xml below:- > > <primitive id="pingd:connected" class="ocf" type="pingd" > provider="heartbeat"> > <instance_attributes id="pingd:connected_instance_attrs"> > <attributes> > <nvpair id="15c8d68d-9729-4db9-b92e-141d30e8eac3" > name="pidfile" value="/tmp/ha_pingd_pid"/> > <nvpair id="6b01b3be-c298-4f2e-8d08-e22084f5c5ca" > name="host_list" value="carbon dubnium sydsw1"/> > <nvpair id="979fb490-8899-4368-a33a-d06c1ae8dadb" > name="name" value="pingd:connected:id"/>
This name ... > <nvpair id="8cd4aff4-117b-4e33-ad4c-fe3cd220255b" > name="multiplier" value="100"/> > </attributes> > </instance_attributes> > </primitive> > > <rsc_location id="group_1:connected" rsc="group_1"> > <rule id="group_1:connected:rule" > score_attribute="pingd:connected"> ... does not match this one. Thanks, Dejan > <expression id="group_1:connected:expr:defined" > attribute="pingd:connected" operation="defined"/> > </rule> > </rsc_location> > > This is just as happy with the situation as before, even though the node > with the failed network connection in no way can ping those hosts. > > In the log from the first node:- > Oct 2 15:57:06 sydgw1 lrmd: [32694]: info: RA output: > (pingd:connected:start:stdout) Adding ping host carbonAdding ping host > dubniumAdding ping host sydsw1 > Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM > operation pingd:connected_start_0 (call=16, rc=0) complete > Oct 2 15:57:06 sydgw1 crmd: [32697]: info: build_operation_update: > Digest for 0:0;13:2:d1e63583-0eba-4a44-8b53-b10ed4aa449e > (pingd:connected_start_0) was 30362598aa31f8e8d68c0c9870c6703c > Oct 2 15:57:06 sydgw1 crmd: [32697]: info: log_data_element: > build_operation_update: digest:source <parameters multiplier="100" > name="pingd:connected:id" host_list="carbon dubnium sydsw1" > pidfile="/tmp/ha_pingd_pid"/> > Oct 2 15:57:06 sydgw1 crmd: [32697]: info: process_lrm_event: LRM > operation IPaddr2_4_monitor_5000 (call=15, rc=0) complete > Oct 2 15:57:11 sydgw1 pingd: [643]: info: do_node_walk: Requesting the > list of configured nodes > Oct 2 15:57:11 sydgw1 attrd: [32696]: info: find_hash_entry: Creating > hash entry for pingd:connected:id > Oct 2 15:57:11 sydgw1 pingd: [643]: info: send_update: 0 active ping nodes > Oct 2 15:57:11 sydgw1 pingd: [643]: info: main: Starting pingd > Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_trigger_update: > Sending flush op to all hosts for: pingd:connected:id > Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_ha_callback: flush > message from sydgw1.zomojo.com > Oct 2 15:57:12 sydgw1 attrd: [32696]: info: attrd_perform_update: Sent > update 3: pingd:connected:id=0 > > > What have I missed ? > > Thanks for your help > > Phil. > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
