Hi Al, Albert Chu wrote
> Hey Dave, Frank, > > As discussed in the previous thread, there was a corner case in the > bmc-watchdog workaround I previously did. I then discovered another > corner case w/ the workaround. > > There is a new beta here. sorry, I was away, but I'm going to test the new beta now. During my absense the Sun X4100M2 produced two strange things: 1) bmc-watchdog: Get Watchdog Timer Error: No error message found for command 25h, network function 06h, and completion code 80h. Please report to <[email protected]> 2) The really bad thing was three of the X4100M2 being rebooted by the watchdog as reaction to a "bmc-watchdog -s -k" call I guess. The timer runs 15 minutes and I reset the watchdog by to independent instances every 3 minutes. On all three machines I found this in the logs: Jul 3 21:03:01 sunserver8 /usr/sbin/cron[11808]: (root) CMD (/usr/bin/bmc-reset) Jul 3 21:03:04 sunserver8 pm-profiler: Power Button pressed, executing /sbin/shutdown -h now Jul 3 21:03:04 sunserver8 shutdown[11853]: shutting down for system halt The bmc-reset script just does this: for name in `seq 1 15` do # -s -k means: reset if running. Could be that the timer was # stopped because the init script failed to set it up. We should # not start it then. output=`/usr/sbin/bmc-watchdog -s -k 2>&1` exitstatus=$? if [ "$exitstatus" != "0" ] then sleep 3 else exit 0 fi done There was always 2-3 seconds between the cron entry and the shutdown so I guess the ilom of the Sun initiated the shutdown due to the bmc-watchdog -s -k command. The timer cannot have run down because I get an email for every failed try to reset the watchdog and should have gotten 3-4 of them in the 15 minutes the timer runs. Has anything liks this reported before? Btw, Sun first refused to develop a firmware update for the X4100M2 because it is EOL, but due to our 5-year-support warranty they are forced to do so ;-) Now they are developing a patch for a newer machine, because they stated that the error exists in may of the SunFire machines, and will then backport it to the 4100. cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * _______________________________________________ Freeipmi-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-devel
