Hi all,
I wrote my own OCF resource agent (RA) for my diploma thesis and got it
finally to work in Heartbeat. Now, I'm busy to set up a virtual
validation cluster for my RA, where virtual means that I'm using
VirtualBox (see http://www.virtualbox.org) as virtualization software.
I planed to first get a two-node cluster to work properly and then
continue to add one node after the other until I have a working
five-node cluster. The nodes run SLES10 hereby.
The issue I now have is the typical split-brain:
I start my RA together with a NFS mount (Filesystem resource) and a
virtual IP address (IPaddress2 resource) bundled in a resource group on
node1 while node2 is happy and idle... shutting the network interface
down on node1 causes
- node2 to decide to take over the resource group and define node1 as
offline
- node1 to decide to keep the resource group and define node2 as offline
When I then put the network interface on node1 back up again, the two
nodes don't find each other again and in the logs (/var/log/messages)
entries tell me that heartbeats from the other node are being lost.
As I don't have a Stonith device in this scenario and the ssh-Stonith
device does not work either because of only having a single network
interface, Stonith does not seem to be the way of choice here.
I thought of adding another resource that checks the connectivity to the
nfs server and if it cannot reach the nfs server the node is rebooted.
Could pingd be what I'm searching for? Any other ideas what I could do?
While reading the article on pingd on the Linux-HA website
(http://linux-ha.org/pingd), I slipped over an example which I either
don't understand or contains a mistake:
The example constraint:
<rsc_location id="my_resource:connected" rsc="my_resource">
<rule id="my_resource:connected:rule"
score_attribute="default_ping_set">
<expression id="my_resource:connected:expr:defined"
attribute="default_ping_set" operation="defined"/>
</rule>
</rsc_location>
The statement which confuses me:
requires the value of default_ping_set to be greater than 100 (c001n05
is unaltered)
--> shouldn't this be: requires the value of default_ping_set to be
greater than 0 (c001n05 is unaltered) ?
Thanks and greets, :)
Lars (D. Forseth).
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems