Hi Greg, a common and often overlooked reason is to set a feasible stop action timeout value. If this value is too small than the stop actions times out which leads to node stonithing.
Look at resources which might take a long time to stop properly (even when under load). Only one example: dismounting a filesystem with many dirty buffers. Best regards Andreas Mock -----Ursprüngliche Nachricht----- Von: [email protected] [mailto:[email protected]] Im Auftrag von Greg Woods Gesendet: Montag, 22. April 2013 17:51 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] clean shutdown procedure? On Mon, 2013-04-22 at 10:12 +1000, Andrew Beekhof wrote: > On Saturday, April 20, 2013, Greg Woods wrote: > Often one of the > > nodes gets stuck at "Stopping HA Services" > > > That means pacemaker is waiting for one of your resources to stop. > Do you have anything that would take a long time (or fail to stop)? Not that I am aware of. But some things that came up during this weekend's powerdown make me think that some of the stop actions are failing, because setting the stop-all-resources=true property sometimes caused nodes to be fenced. I always dread having to try and find useful information in the voluminous Pacemaker/Heartbeat logs, but I'll have to try. Of course, this doesn't happen on the test clusters, and it is hard to debug it when reproducing it requires creating a service outage on a production cluster. --Greg _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
