Hi, On Mon, Dec 22, 2008 at 12:18:06PM +0100, Tobias Appel wrote: > Hi, > > sorry to bug you guys again before christmas but I have a very weird > error. > I have a 2 node setup with drbd and Heartbeat 2.14. One resource group > which contains Nagios (something like BigBrother). > > Now I configured everything and did some tests with starting and stoping > heartbeat service on the servers - the failover worked. > > But if I run 'shutdown -r now' on the active node the server will not > reboot and the resource group will not be moved to the passive node. > When I run crm_mon I can see: > nagios-core (lsb:nagios): Started node01 (unmanaged) FAILED > > The server will do nothing then. It will not reboot, the rest of the > resource group is still running! The log file from nagios tells me it > correctly shutdown. I did browse through the big big ha-log but I > couldn't find anything that would help me. > > pengine[27246]: 2008/12/22_11:47:11 WARN: unpack_rsc_op: Processing > failed op nagios-core_stop_0 on node01: Error > > I really have no idea what to look for or what to do.
A resource failed to stop. That's typically a reason to kill the node, but you probably don't have stonith setup. If a resource can't be stopped and there's no stonith enabled, then that resource can't be started anywhere. Thanks, Dejan > Best Regards, > Tobi > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
