On 7/24/07, Lars Daniel Forseth <[EMAIL PROTECTED]> wrote: > Hi all, > > > I wrote my own OCF resource agent (RA) for my diploma thesis and got it > finally to work in Heartbeat. Now, I'm busy to set up a virtual > validation cluster for my RA, where virtual means that I'm using > VirtualBox (see http://www.virtualbox.org) as virtualization software. > I planed to first get a two-node cluster to work properly and then > continue to add one node after the other until I have a working > five-node cluster. The nodes run SLES10 hereby. > > The issue I now have is the typical split-brain: > > I start my RA together with a NFS mount (Filesystem resource) and a > virtual IP address (IPaddress2 resource) bundled in a resource group on > node1 while node2 is happy and idle... shutting the network interface > down on node1 causes > > - node2 to decide to take over the resource group and define node1 as > offline > > - node1 to decide to keep the resource group and define node2 as offline > > When I then put the network interface on node1 back up again, the two > nodes don't find each other again and in the logs (/var/log/messages) > entries tell me that heartbeats from the other node are being lost. > > As I don't have a Stonith device in this scenario and the ssh-Stonith > device does not work either because of only having a single network > interface, Stonith does not seem to be the way of choice here. > > I thought of adding another resource that checks the connectivity to the > nfs server and if it cannot reach the nfs server the node is rebooted. > Could pingd be what I'm searching for?
Unlikely. The most pingd will do is stop resources on nodes with no connectivity. Basically there is no substitute for STONITH. Why not try writing a stonith agent for VirtualBox? I recently wrote one for vmware... > Any other ideas what I could do? > > > While reading the article on pingd on the Linux-HA website > (http://linux-ha.org/pingd), I slipped over an example which I either > don't understand or contains a mistake: > > The example constraint: > > <rsc_location id="my_resource:connected" rsc="my_resource"> > <rule id="my_resource:connected:rule" > score_attribute="default_ping_set"> > <expression id="my_resource:connected:expr:defined" > attribute="default_ping_set" operation="defined"/> > </rule> > </rsc_location> > > > The statement which confuses me: > > requires the value of default_ping_set to be greater than 100 (c001n05 > is unaltered) > > > --> shouldn't this be: requires the value of default_ping_set to be > greater than 0 (c001n05 is unaltered) ? its not inherently wrong. the statement just says that you need to be connected to at least _two_ ping nodes instead of one. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
