Hi,
I'm running a two node failover cluster. Yesterday the cluster tried to manage a state transition. In the log files I found the following entries: heartbeat[6905]: 2009/02/10_21:45:55 WARN: node nagios-drbd2: is dead heartbeat[6905]: 2009/02/10_21:45:55 info: Link nagios-drbd2:eth1 dead. A few minutes later the node that was still alive tried to take over the resources and created the following entries in the log file ( the resource "ipaddress" is an example, there are a lot more entries for the other resources that were running on the cluster ): pengine[7370]: 2009/02/10_21:45:59 WARN: custom_action: Action resource_nagios_ipaddress_stop_0 on nagios-drbd2 is unrunnable (offline) pengine[7370]: 2009/02/10_21:45:59 WARN: custom_action: Marking node nagios-drbd2 unclean Further more there a several entries telling: stonithd[6916]: 2009/02/10_21:46:30 ERROR: Failed to STONITH the node nagios-drbd2: optype=RESET, op_result=TIMEOUT The stonith is running via ssh on a direct link between the to nodes. Since Node2 was down the shutdown command never reached its destination. My Questions are: Why did the alive cluster try to stop resources on a cluster node that is considered as dead? Why did STONITH try to shut down a node that is considered down? ( for safety reasons I think ) Shouldn't the resources just be started on the alive node without any further action? Did I miss something in the default behaviour of heartbeat? Maybe a timeout? Would a hardware STONITH device solve such problems in the future? These entries as shown above fill the log from the time the node was found down until this morning I reached my Workstation. With kind regards Kai Zemke =========================================================== smartnet Online Service GmbH, Schnackenburgallee 177, 22525 Hamburg =========================================================== Geschäftsführer: Christian Suding, Claus Masch Ust.IdNr.:DE191136350 Handelsregister HRB 66463 Steuernummer: FA: Hamburg 54/855/01047 Fon: +49 (0) 40 5540-0 Fax: +49 (0) 40 5540-1040 [email protected] Weitere Informationen siehe: http://www.smartnet.de <http://www.smartnet.de/> =========================================================== Hinweis: Diese Email kann vertrauliche und/oder rechtlich geschützte Informationen enthalten. Wenn Sie nicht der beabsichtigte Empfänger sind oder diese Email irrtümlich erhalten haben, informieren Sie bitte sofort den Absender telefonisch oder per Email und löschen Sie diese Email aus Ihrem System. Das unerlaubte Kopieren, sowie die unbefugte Weitergabe dieser Email ist nicht gestattet.Wir haften nicht für die Unversehrtheit von Emails, nachdem sie unseren Einfluss- Bereich verlassen haben. ********************************************************************************************** IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error, please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies thereof. *** eSafe scanned this email for viruses, vandals, and malicious content. *** ********************************************************************************************** _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
