On Mon, 2010-09-27 at 12:16 -0700, Robinson, Eric wrote: > Not sure if you noticed in my previous message that I did physically > power down the primary but the standby refused to take any action.
Yes, I did notice that. My point is that I have noted on my clusters that simply powering it down (i.e. having it suddenly go away) may not be enough. That requires it to simply assume that the primary has gone away, and that it's not just a cable or NIC failure. STONITH is a method of *assuring* that the other node has gone away. It is designed to prevent both nodes from trying to run the same resources, which can have disastrous consequences. As I noted, I am not certain whether or not using STONITH is absolutely required now, but I have observed the same symptoms as you, and I ended up having to configure STONITH in order to get failovers to work properly. Usually though, if I explicitly set one node to standby, the other one will take over, because they can exchange messages that will convince the remaining node that the standby node will not be running any resources. So I really don't know if STONITH is your problem or would fix your problem. I only note that I have seen the same symptoms and that was how I fixed it for my clusters. --Greg _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
