As you guys recall, I have set up a heartbeat/drbd based system to replace an aging drbd solution.
While it sits there, it has not been activated. I have noticed (due to some self checking scripts) that heartbeat died on one machine. Looking in logs, I found this in ha-log.2: Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: WARN: Managed HBREAD process 3279 killed by signal 24 [SIGXCPU - CPU limit exceeded]. Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: ERROR: Managed HBREAD process 3279 dumped core Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: ERROR: HBREAD process died. Beginning communications restart process for comm channel 0. Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: info: glib: UDP Broadcast heartbeat closed on port 12694 interface eth1 - Status: 1 Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: WARN: Managed HBWRITE process 3278 killed by signal 9 [SIGKILL - Kill, unblockable]. Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: ERROR: Both comm processes for channel 0 have died. Restarting. Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: info: glib: UDP Broadcast heartbeat started on port 12694 (12694) interface eth1 Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: info: glib: UDP Broadcast heartbeat closed on port 12694 interface eth1 - Status: 1 Dec 13 17:13:14 pfs-srv3 heartbeat: [1243]: info: Communications restart succeeded. Dec 16 10:29:38 pfs-srv3 heartbeat: [1269]: CRIT: Emergency Shutdown: Master Control process died. Dec 16 10:29:38 pfs-srv3 heartbeat: [1269]: CRIT: Killing pid 1243 with SIGTERM Dec 16 10:29:38 pfs-srv3 heartbeat: [1269]: CRIT: Killing pid 7247 with SIGTERM Dec 16 10:29:38 pfs-srv3 heartbeat: [1269]: CRIT: Killing pid 7248 with SIGTERM Dec 16 10:29:38 pfs-srv3 heartbeat: [1269]: CRIT: Emergency Shutdown(MCP dead): Killing ourselves. It looks like heartbeat had a couple of issues, one is dying from SIGXCPU, and another is dying from master control process. Any ideas as to why this could have happened? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
