>>> On 11/6/2009 at 04:29 AM, c smith <[email protected]> wrote: > Hi All- > > I have read all the documentation I can find, including Dejan's PDF on the > subject but still have some questions regarding STONITH. I have so far been > testing with the external SSH plugin just to get a feel for how it works and > will be using an APC SNMP device in production. > > So far it has been much more straight-forward than I had expected, however, > I have a couple of questions regarding certain scenarios that may occur and > how STONITH/stonithd reacts. If someone can weigh in and offer some insight, > it would help clear this up for me! This is all being tested on a two node > cluster with heartbeat + pacemaker. > > - So far I have noticed that, when disconnecting all network devices on a > node, the STONITH survivor is the node that was DC before network > connections dropped. Is there a way to migrate the role of DC to another > node?
Well, you could stop heartbeat on the DC node before disconnecting the network, but you probably don't want to do that. For all practical purposes, it shouldn't matter where the DC is, or which node takes over resources after split-brain, provided one node is killed and/or the cluster is able to reform after STONITH. > - When all resources are running on node1, and node2 is the DC, if I > unplug/`ifconfig down` heartbeat's interfaces on either node, node1 becomes > STONITH victim and resources are migrated to node2. After disconnection, > both nodes can reach the outside network and it would be okay for resources > to run on either. Is there a way to work with scoring/pingd/something-else > so that the node not running any resources becomes the victim to avoid > failover? Does resource scoring have influence on STONITH at all? Resource scores don't influence STONITH. It's a question of "the other node looks like it's dead, I'd better make sure it's *really* dead, regardless of where the resources are". > - With SNMP and other stonith plugins that require network connectivity, is > it to be assumed that a node whos lost network connectivity is as good as > dead, STONITH'd from the other node that is still able to reach the STONITH > device? From what i've found during my initial tests, when a node drops > from the network it attempts to STONITH the other but can't connect and > fails. Is this the way it is intended to work? Both nodes STONITH each > other and the one that succeeds wins? That's what'll happen. Absent evidence of life from either node, the only safe thing to do is try to kill it. With only two nodes, you can't assume anything else, as there's no clear majority. By comparison, if there were three nodes, and one node couldn't see the others, it could safely assume that *it* was faulty. HTH, Tim -- Tim Serong <[email protected]> Senior Clustering Engineer, Novell Inc. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
