Albert Chu wrote > Hey Frank, > > This is indeed very strange. I assume the reboots are because the timer > eventually times out, perhaps because the resets are no longer working > (lets say the BMC goes out to lunch).
I don't think so because in the tests I repeat the resets every second and I always see if they succeed or not. Many of them are rejected with some kind of error messages, but it never happens that all fail for more than one minute. However, when I loop "bmc-watchdog -g" I get the strangest results with all fields showing complete nonsense, like Initial Countdown: 6553 sec Present Countdown: 0 sec and a second later Initial Countdown: 900 sec Present Countdown: 24513 sec and so on. Also the action field etc. change their values. If the timer would just run down, the host would reset and not power-off. So I guess that the ILOM is just that buggy that it can get confused by polling or resetting it :-( > Does the bmc-watchdog log say anything interesting? Normally > it's /var/log/freeipmi/bmc-watchdog.log. It says a lot, but nothing different just before shutting down that it hadn't showed before. E.g.: [Jul 05 08:38:08]: _set_watchdog_timer_cmd: fill_cmd_set_watchdog_timer: Invalid argument [Jul 05 08:38:18]: Get Cmd: ipmi_kcs_cmd: driver timeout [Jul 05 08:38:22]: Get Cmd: cmd error: 2h [Jul 05 08:38:38]: _get_watchdog_timer_cmd: fiid_obj_get: 'timeout_action': data not available [Jul 05 08:38:38]: _set_watchdog_timer_cmd: fill_cmd_set_watchdog_timer: Invalid argument [Jul 05 08:38:44]: Set Cmd: ipmi_kcs_cmd: driver timeout [Jul 05 08:38:50]: Set Cmd: ipmi_kcs_cmd: internal IPMI error [Jul 05 08:39:01]: Set Cmd: ipmi_kcs_cmd: internal IPMI error [Jul 05 08:39:23]: _get_watchdog_timer_cmd: fiid_obj_get: 'timeout_action': data not available [Jul 05 08:39:27]: _get_watchdog_timer_cmd: fiid_obj_get: 'initial_countdown_value': data not available [Jul 05 08:39:35]: _get_watchdog_timer_cmd: fiid_obj_get: 'initial_countdown_value': data not available [Jul 05 08:39:51]: Get Cmd: cmd error: 80h [Jul 05 08:39:52]: Set Cmd: ipmi_kcs_cmd: internal IPMI error [Jul 05 08:40:21]: Get Cmd: ipmi_kcs_cmd: driver timeout Strange enough, the watchdog reacts a lot quicker and more stable when I poll it through the network interface by "ipmitool ... bmc watchdog reset" or "get". It immediately responds, always with correct values, and never shuts down. Maybe that's because I don't have any special driver loaded on Linux? The sun driver is not available for Linux as far as I understood, so I'm just using "bmc-watchdog -g" without any drivers. cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * _______________________________________________ Freeipmi-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-devel
