Hi,
First, please let me apologize for not having the 1.8.10
patch-and-outstanding-issues list completed yet. I had a series of
crises to deal with since I posted my intent to get started on this and
so was delayed. However, all that is behind me now (knock on wood ;-)
and I'm now back to working on the preliminary 1.8.10 full-time. I
expect to have the list of outstanding patches/issues posted to this
mailing list soon for discussion.
I also mentioned some time back that I wanted to add support to ipmitool
for turning off a running watchdog timer. As you may recall, during the
watchdog discussion several months back, we'd decided that full watchdog
"set" command support is too dangerous and we subsequently had some
patches pulled from the cvs tree which allowed that functionality.
However, using the "set" command to turn off a running watchdog timer
would be very useful. (Of course any of the watchdog set commands can
still be sent via the raw command interface.)
While browsing around the cvs tree, I noticed that a (second) set of
watchdog commands is present under the mc command and having been added
since 1.8.9 was released. This seemed to me to be an ok place to have
some watchdog support so I've gone ahead and made some modifications to
the existing watchdog commands there. I've expanded the "get" command
results displayed, made "reset" safe (see below), and changed "set" to
do one job only -- turn off a running timer. I've also added man page
support for this new functionality. I am attaching a patch for review
and comments. Any and all (constructive ;-) comments are much
appreciated.
For folks concerned about even having a reset command at all (valid
concerns imo), I want to add a few notes on why I believe the "reset"
I've implemented here is safe. I ran some tests of this reset command
in various scenarios and saw the behavior outlined below. I only ran
these tests on system and one (2.6.18) kernel so more extensive testing
should be done to be sure it works ok on other systems before we roll to
1.8.10. (And I intend to do more testing myself, too.) If issues are
found that we aren't able to fully address, we can ifdef or pull the
reset code.
First, reset run in-band:
1) if the ipmi watchdog driver is not yet started, "reset" starts the
timer but the action is "No action" so when the countdown ends, it does
nothing.
2) if the ipmi watchdog driver is started with start_now=0, it does the
same as #1 -- timer started but action is "No action".
3) if the ipmi watchdog driver is started with start_now=1, "reset"
resets the timer back to 5 minutes (a semi-arbitrary time I chose so it
would be similar to the default time setting on many BMCs). Other
values set by the watchdog driver are unchanged. Note: this is where I
see the most value for the reset command -- it gives more time if/when
occasionally needed.
4) After using "ipmitool mc watchdog off" to turn off a running timer,
the action is set to "No action". A reset command will reset the timer,
but since it retains the former action setting of "No action", nothing
happens when the countdown ends.
5) After sending '-V' to the ipmi watchdog driver for a graceful
shutdown: The watchdog driver doesn't appear to shut off the timer, but
sets the action to "No action" -- the countdown ends and nothing happens
(similar to #1, #2, #4 above). A subsequent "restart" command will
reset the timer, but there will be "No action" when the countdown ends.
Second, reset run out-of-band:
1) if the ipmi watchdog driver is not present/started, the behavior of
reset is "No action" so the timer starts and then counts down, but
nothing happens when the countdown completes.
2) if the ipmi watchdog driver is present and start_now=1 is set,
"reset" will restart the countdown but will not change anything else set
by the ipmi watchdog driver (e.g., if the driver's action is to reset
the system, the reset will still be triggered after the new countdown
period ends).
--
Again, all comments, concerns, and feedback are much appreciated.
Thanks,
Carol Hebert
--- ipmitool.orig/lib/ipmi_mc.c 2007-12-14 12:56:53.000000000 -0800
+++ ipmitool/lib/ipmi_mc.c 2008-04-30 16:36:40.000000000 -0700
@@ -155,8 +155,9 @@
struct bitfield_data * bf;
printf("MC Commands:\n");
printf(" reset <warm|cold>\n");
+ printf(" guid\n");
printf(" info\n");
- printf(" wdt\n");
+ printf(" watchdog <get|reset|off>\n");
printf(" selftest\n");
printf(" getenables\n");
printf(" setenables <option=on|off> ...\n");
@@ -166,6 +167,17 @@
}
}
+
+static void
+print_watchdog_usage(void)
+{
+ lprintf(LOG_NOTICE, "usage: watchdog <command>:");
+ lprintf(LOG_NOTICE, " get : Get Current Watchdog settings");
+ lprintf(LOG_NOTICE, " reset : Restart Watchdog timer based on most recent settings");
+ lprintf(LOG_NOTICE, " off : Shut off a running Watchdog timer");
+}
+
+
/* ipmi_mc_get_enables - print out MC enables
*
* @intf: ipmi inteface
@@ -251,12 +263,12 @@
if (strncmp(argv[i]+nl+1, "off", 3) == 0) {
printf("Disabling %s\n", bf->desc);
en &= ~bf->mask;
- }
+ }
else if (strncmp(argv[i]+nl+1, "on", 2) == 0) {
printf("Enabling %s\n", bf->desc);
en |= bf->mask;
- }
- else {
+ }
+ else {
lprintf(LOG_ERR, "Unrecognized option: %s", argv[i]);
}
}
@@ -470,7 +482,7 @@
if (rsp->ccode) {
lprintf(LOG_ERR, "Bad response: (%s)",
- val2str(rsp->ccode, completion_code_vals));
+ val2str(rsp->ccode, completion_code_vals));
return -1;
}
@@ -545,25 +557,25 @@
*/
const char *wdt_use_string[8] = {
- "reserved",
+ "Reserved",
"BIOS FRB2",
"BIOS/POST",
"OS Load",
"SMS/OS",
"OEM",
- "reserved",
- "reserved"
+ "Reserved",
+ "Reserved"
};
const char *wdt_action_string[8] = {
- "no action",
+ "No action",
"Hard Reset",
"Power Down",
"Power Cycle",
- "reserved",
- "reserved",
- "reserved",
- "reserved"
+ "Reserved",
+ "Reserved",
+ "Reserved",
+ "Reserved"
};
static int
@@ -579,32 +591,37 @@
req.msg.data_len = 0;
rsp = intf->sendrecv(intf, &req);
- if (!rsp) {
- printf("no response\n");
+ if (rsp == NULL) {
+ lprintf(LOG_ERR, "Get Watchdog Timer command failed");
return -1;
}
if (rsp->ccode) {
- printf("returned CC code 0x%02x\n", rsp->ccode);
+ lprintf(LOG_ERR, "Get Watchdog Timer command failed: %s",
+ val2str(rsp->ccode, completion_code_vals));
return -1;
}
wdt_res = (struct ipm_get_watchdog_rsp *) rsp->data;
- printf("Timer Use: 0x%02x - %s\n", wdt_res->timer_use, wdt_use_string[wdt_res->timer_use]);
- printf("Timer Actions: 0x%02x - %s\n", wdt_res->timer_actions, wdt_action_string[wdt_res->timer_actions]);
- printf("Pre-timeout interval: 0x%02x\n", wdt_res->pre_timeout);
- printf("Timer Use Expiration: 0x%02x\n", wdt_res->timer_use_exp);
- printf("Initial Countdown: %i ms\n",
- (wdt_res->initial_countdown_msb << 8) | wdt_res->initial_countdown_lsb);
- printf("Present Countdown: %i ms\n",
- (wdt_res->present_countdown_msb << 8) | wdt_res->present_countdown_lsb);
+ printf("Watchdog Timer Use: %s (0x%02x)\n",
+ wdt_use_string[wdt_res->timer_use], wdt_res->timer_use);
+ printf("Watchdog Timer Is: %s\n",
+ wdt_res->timer_use & 0x40 ? "Started/Running" : "Stopped");
+ printf("Watchdog Timer Actions: %s (0x%02x)\n",
+ wdt_action_string[wdt_res->timer_actions], wdt_res->timer_actions);
+ printf("Pre-timeout interval: %d seconds\n", wdt_res->pre_timeout);
+ printf("Timer Expiration Flags: 0x%02x\n", wdt_res->timer_use_exp);
+ printf("Initial Countdown: %i sec\n",
+ ((wdt_res->initial_countdown_msb << 8) | wdt_res->initial_countdown_lsb)/10 );
+ printf("Present Countdown: %i sec\n",
+ (((wdt_res->present_countdown_msb << 8) | wdt_res->present_countdown_lsb)) / 10);
return 0;
}
-/* ipmi_mc_set_watchdog
+/* ipmi_mc_shutoff_watchdog
*
* @intf: ipmi interface
*
@@ -612,7 +629,7 @@
* returns -1 on error
*/
static int
-ipmi_mc_set_watchdog(struct ipmi_intf * intf, int argc, char ** argv)
+ipmi_mc_shutoff_watchdog(struct ipmi_intf * intf)
{
struct ipmi_rs * rsp;
struct ipmi_rq req;
@@ -624,31 +641,43 @@
req.msg.data = msg_data;
req.msg.data_len = 6;
- printf("FIXME - not fully implemented\n");
+ /*
+ * The only set cmd we're allowing is to shut off the timer.
+ * Turning on the timer should be the job of the ipmi watchdog driver.
+ * See 'modinfo ipmi_watchdog' for more info. (NOTE: the reset
+ * command will restart the timer if it's already been initialized.)
+ *
+ * Out-of-band watchdog set commands can still be sent via the raw
+ * command interface but this is a very dangerous thing to do since
+ * a periodic "poke"/reset over a network is unreliable. This is
+ * not a recommended way to use the IPMI watchdog commands.
+ */
- msg_data[0] = 0x03; /* os load*/
- msg_data[1] = 0x02; /* action power down */
- msg_data[2] = 10; /* pretimeout */
- msg_data[3] = 0;
- msg_data[4] = 10; /* timeout lsb in 100ms/count */
- msg_data[5] = 0; /* timeout lsb */
+ msg_data[0] = IPM_WATCHDOG_SMS_OS;
+ msg_data[1] = IPM_WATCHDOG_NO_ACTION;
+ msg_data[2] = 0x00; // pretimeout interval
+ msg_data[3] = IPM_WATCHDOG_CLEAR_SMS_OS;
+ msg_data[4] = 0xb8; // countdown lsb (100 ms/count)
+ msg_data[5] = 0x0b; // countdown msb - 5 mins
rsp = intf->sendrecv(intf, &req);
- if (!rsp) {
- printf("no response\n");
+ if (rsp == NULL) {
+ lprintf(LOG_ERR, "Watchdog Timer Shutoff command failed!");
return -1;
}
if (rsp->ccode) {
- printf("returned CC code 0x%02x\n", rsp->ccode);
+ lprintf(LOG_ERR, "Watchdog Timer Shutoff command failed! %s",
+ val2str(rsp->ccode, completion_code_vals));
return -1;
}
+ lprintf(LOG_ERR, "Watchdog Timer Shutoff successful -- timer stopped");
return 0;
}
-/* ipmi_mc_set_watchdog
+/* ipmi_mc_rst_watchdog
*
* @intf: ipmi interface
*
@@ -667,16 +696,20 @@
req.msg.data_len = 0;
rsp = intf->sendrecv(intf, &req);
- if (!rsp) {
- printf("no response\n");
+ if (rsp == NULL) {
+ lprintf(LOG_ERR, "Reset Watchdog Timer command failed!");
return -1;
}
if (rsp->ccode) {
- printf("returned CC code 0x%02x\n", rsp->ccode);
+ lprintf(LOG_ERR, "Reset Watchdog Timer command failed: %s",
+ (rsp->ccode == IPM_WATCHDOG_RESET_ERROR) ?
+ "Attempt to reset unitialized watchdog" :
+ val2str(rsp->ccode, completion_code_vals));
return -1;
}
+ lprintf(LOG_ERR, "IPMI Watchdog Timer Reset - countdown restarted!");
return 0;
}
@@ -726,19 +759,22 @@
else if (!strncmp(argv[0], "selftest", 8)) {
rc = ipmi_mc_get_selftest(intf);
}
- else if (!strncmp(argv[0], "wdt", 3)) {
- if (argc < 2) {
- rc = ipmi_mc_get_watchdog(intf);
- }else if(strncmp(argv[1], "get", 3) == 0){
+ else if (!strncmp(argv[0], "watchdog", 3)) {
+ if (argc < 2 || strncmp(argv[1], "help", 4) == 0) {
+ print_watchdog_usage();
+ }
+ else if (strncmp(argv[1], "get", 3) == 0) {
rc = ipmi_mc_get_watchdog(intf);
- }else if(strncmp(argv[1], "set", 3) == 0){
- if(argc > 5)
- rc = ipmi_mc_set_watchdog(intf, argc-1, &(argv[1]));
- else
- printf("wdt set <use><action><pretimeout><countdown> FIXME - not fully implemented\n");
- }else if(strncmp(argv[1], "rst", 3) == 0){
+ }
+ else if(strncmp(argv[1], "off", 3) == 0) {
+ rc = ipmi_mc_shutoff_watchdog(intf);
+ }
+ else if(strncmp(argv[1], "reset", 5) == 0) {
rc = ipmi_mc_rst_watchdog(intf);
}
+ else {
+ print_watchdog_usage();
+ }
}
else {
lprintf(LOG_ERR, "Invalid mc/bmc command: %s", argv[0]);
--- ipmitool.orig/include/ipmitool/ipmi_mc.h 2007-04-26 06:19:33.000000000 -0700
+++ ipmitool/include/ipmitool/ipmi_mc.h 2008-04-30 16:20:36.000000000 -0700
@@ -114,4 +114,23 @@
unsigned char present_countdown_msb;
} __attribute__ ((packed));
+#define IPM_WATCHDOG_RESET_ERROR 0x80
+
+#define IPM_WATCHDOG_BIOS_FRB2 0x01
+#define IPM_WATCHDOG_BIOS_POST 0x02
+#define IPM_WATCHDOG_OS_LOAD 0x03
+#define IPM_WATCHDOG_SMS_OS 0x04
+#define IPM_WATCHDOG_OEM 0x05
+
+#define IPM_WATCHDOG_NO_ACTION 0x00
+#define IPM_WATCHDOG_HARD_RESET 0x01
+#define IPM_WATCHDOG_POWER_DOWN 0x02
+#define IPM_WATCHDOG_POWER_CYCLE 0x03
+
+#define IPM_WATCHDOG_CLEAR_OEM 0x20
+#define IPM_WATCHDOG_CLEAR_SMS_OS 0x10
+#define IPM_WATCHDOG_CLEAR_OS_LOAD 0x08
+#define IPM_WATCHDOG_CLEAR_BIOS_POST 0x04
+#define IPM_WATCHDOG_CLEAR_BIOS_FRB2 0x02
+
#endif /*IPMI_MC_H */
--- ipmitool.orig/doc/ipmitool.1 2007-12-14 11:29:03.000000000 -0800
+++ ipmitool/doc/ipmitool.1 2008-04-30 16:15:21.000000000 -0700
@@ -233,6 +233,36 @@
revision, firmware revision, IPMI version supported, manufacturer ID,
and information on additional device support.
.TP
+\fIwatchdog\fP <\fBcommand\fR>
+.br
+
+Perform various watchdog timer setting commands to view and
+change the current state of the timer.
+.RS
+.TP
+\fIget\fP
+.br
+
+Show current Watchdog Timer settings and countdown state.
+.TP
+\fIreset\fP
+.br
+
+Reset the Watchdog Timer to its most recent state and restart the
+countdown timer.
+.TP
+\fIoff\fP
+.br
+
+Turn off a currently running Watchdog countdown timer.
+.RE
+.TP
+\fIselftest\fP
+.br
+
+Check on the basic health of the BMC by executing the Get Self Test
+results command and report the results.
+.TP
\fIgetenables\fP
.br
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel