Re: [Linux-HA] what to do on loss of network

Steve Wray Wed, 30 Jan 2008 16:54:08 -0800

Jonas Andradas wrote:

Hello Steve,



On Wed, Jan 30, 2008 at 7:55 PM, Steve Wray <[EMAIL PROTECTED]> wrote:

Jonas Andradas wrote:

Hello,

if my memory doesn't fail, the cib.xml file should be located in:

/var/lib/heartbeat/crm/cib.xml

Thanks for that.

The "base" cib.xml contains just the configuration.  During operation,
values are added and modified on the fly, as you say, with the score of

each

node, the pingd score, and so.

So the xml code which was given on the pingd documentation page does
need to be *manually* inserted into the cib.xml code?


Yes, that pingd code has to be inserted into de cib.xml.  During execution,
a section of the XML (which I cannot remember right now between which tags
can be found) is updated on-the-fly, with execution data, such as (as stated
previously) node score, pingd score, and such.  The node with the highest
score is the 'winner node', the one resources would prefer (though it might
be *not* the one they actually run on.  Depending on how the
resource_stickiness is set, resources might stay on a lower-scored node
unless they are forced to switch).

Ok I now have cib.xml working but the behavior of the cluster is stillstrange.

I took the code from the pingd documentation and inserted it into thecib.xml as follows:


<constraints>
  <rsc_location id="rsc_location_group_1" rsc="group_1">
    <rule id="prefered_location_group_1" score="100">

<expression attribute="#uname"id="prefered_location_group_1_expr" operation="eq" value="drbd-test-1"/>

    </rule>
  </rsc_location>
  <rsc_location id="my_resource:connected" rsc="my_resource">
    <rule id="my_resource:connected:rule"
          score="-INFINITY" boolean_op="or">
      <expression id="my_resource:connected:expr:undefined"
          attribute="pingd" operation="not_defined"/>
      <expression id="my_resource:connected:expr:zero"
          attribute="pingd" operation="lte" value="0"/>
    </rule>
  </rsc_location>
</constraints>

The documentation is not clear on this, but is that the correct place toinsert the code fragment?


Note that the ha.cf files now look like this:

crm yes
logfacility     local0
keepalive 100ms
deadping 5
deadtime 30
warntime 10
ucast eth0 10.10.2.26
ucast eth0 10.10.2.27
node drbd-test-1
node drbd-test-2
auto_failback off
ping 10.10.10.1
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s -a pingd

The observed behavior is now that if the passive node loses networkconnectivity but the active node can contact its ping node then theactive node tries to become passive... but fails as it can't unmount itsNFS filesystem or stop drbd. It relinquishes the floating IP addressthough and effectively fails. Kind of the opposite to what I am after...





_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] what to do on loss of network

Reply via email to