Hi, On Mon, Feb 25, 2008 at 03:36:31PM -0500, Doug Lochart wrote: > heartbeat 2.1.3_3 and drbd 8.0.8 (dopd and STONITH ip,i in use) > > I successfully was able to test my 2 node cluster simply by powering > the nodes off and on in varying order and the HA resources > successfully moved in each case (hurray). > Now I went back to my original test of previous frustration. I yanked > all the ethernet cables from the primary machine (both LAN and > crossover) > > On the Secondary (unaffected) machine I see that STONITH tried to > shoot the other node for about 20 minutes before giving up. Right now > my secomdary node says Secondary/Unknown and the Primary Node says > Primary/Unknown. > > First off is there a configurable parameter for STONITH on how long it tries?
No. It should be trying forever. That's what is in the cluster configuration, i.e. protect resources using the stonith, and the cluster shouldn't move until there was a successful reset operation. > When I plug the network back into the Primary immediately rebooted > (not sure why) Either stonith or fastfail. The logs would say. > and when it came back up I was in split brain again. > > So whenever you have 2 nodes in a cluster and all redundant > communication paths have been suffered by default then you will have a > Split Brain that needs to be manually corrected. Am I understanding > this right? No, it should recover automatically. Please take a look at the logs or post them. Thanks, Dejan > I am not complaining I am just trying to determine what I am to expect > so I can write up procedures and what not. The failover worked great > with other tests. > > regards, > > Doug > > > > -- > What profits a man if he gains the whole world yet loses his soul? > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
