Hi all,

I wrote my own OCF resource agent (RA) for my diploma thesis and got it finally to work in Heartbeat. Now, I'm busy to set up a virtual validation cluster for my RA, where virtual means that I'm using VirtualBox (see http://www.virtualbox.org) as virtualization software. I planed to first get a two-node cluster to work properly and then continue to add one node after the other until I have a working five-node cluster. The nodes run SLES10 hereby.

The issue I now have is the typical split-brain:

I start my RA together with a NFS mount (Filesystem resource) and a virtual IP address (IPaddress2 resource) bundled in a resource group on node1 while node2 is happy and idle... shutting the network interface down on node1 causes

- node2 to decide to take over the resource group and define node1 as offline

- node1 to decide to keep the resource group and define node2 as offline

When I then put the network interface on node1 back up again, the two nodes don't find each other again and in the logs (/var/log/messages) entries tell me that heartbeats from the other node are being lost.

As I don't have a Stonith device in this scenario and the ssh-Stonith device does not work either because of only having a single network interface, Stonith does not seem to be the way of choice here.

I thought of adding another resource that checks the connectivity to the nfs server and if it cannot reach the nfs server the node is rebooted. Could pingd be what I'm searching for? Any other ideas what I could do?


While reading the article on pingd on the Linux-HA website (http://linux-ha.org/pingd), I slipped over an example which I either don't understand or contains a mistake:

The example constraint:

<rsc_location id="my_resource:connected" rsc="my_resource">
<rule id="my_resource:connected:rule" score_attribute="default_ping_set"> <expression id="my_resource:connected:expr:defined" attribute="default_ping_set" operation="defined"/>
    </rule>
</rsc_location>


The statement which confuses me:

requires the value of default_ping_set to be greater than 100 (c001n05 is unaltered)


--> shouldn't this be: requires the value of default_ping_set to be greater than 0 (c001n05 is unaltered) ?




Thanks and greets, :)

Lars (D. Forseth).



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to