On 2007/12/31 15:07, chefren wrote:
> And look at the workings of your heartbeat monitor: I bet it needs a loop in
> the software that "pings" it. With software failures: Big chance that loop
> still works and thus the heartbeat monitor isn't triggered while the system
> as a whole can be considered broken.

Even so, it still allows recovery from some serious problems without
touching the machine. There are quite a few situations where this could
be very useful, though it might not be worth the extra expense and
complexity of adding an external device, watchdog timers aren't too
uncommon in PC hardware these days.

> Your heartbeat monitor also needs a way
> to power-cycle the whole system. Relays? How is/are these powered?  Don't
> forget for all the cables and connectors needed.

In the case of the hardware Nick mentioned, there should be a watchdog
timer in the I/O controller hub (82801AA ICH); adding support for this
might be as simple as adding the device ID to /sys/dev/pci/ichwdt.c then
test by setting sysctl kern.watchdog.auto=0 and kern.watchdog.period=30
and wait 30 seconds for it to reboot. See watchdog(4) and watchdogd(8)
("man -k watchdog" gives a list of device drivers supporting watchdog
timers).

The main docs for driving the ICH* watchdog timers are here:
http://download.intel.com/design/chipsets/applnots/29227301.pdf
(also see http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ichwd/
which supports 82801AA).

Reply via email to