Hey Won, On Mon, 2009-01-26 at 18:53 -0800, Won De Erick wrote: > ----- Original Message ---- > > > From: Al Chu <ch...@llnl.gov> > > > > Hey Won, > > > > On Sun, 2009-01-25 at 23:00 -0800, Won De Erick wrote: > > > I am forwarding this to the FreeIPMI users mailing list. Hope, I can get > > > hints > > from you all. > > > Thank you. > > > > > > > > > > > > ----- Forwarded Message ---- > > > From: Won De Erick > > > To: Albert Chu > > > Cc: freeipmi-devel@gnu.org > > > Sent: Saturday, January 24, 2009 11:55:24 AM > > > Subject: Re: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable > > > to get > > SEL record > > > > > > Pls disregard previous email. I forgot to attach the file. :) > > > > Did you send me the wrong debug file? I see debug output from > > ipmi-sensors?? > > > > I'm sorry, attached is the correct one.
Seems that this has a successful ipmi-sel execution in it. So not much I can debug with :-( > > > > Hi Al, > > > > > > With IBM x3650, I noticed that ipmi-sel is unable to get the SEL record. > > > > > > # ipmi-sel --version > > > IPMI Sensors [ipmi-sel-0.6.10] > > > > > > # ipmi-sel > ibm3650-dsc2075-sel.txt > > > ipmi_cmd_get_sel_entry: BMC busy > > > ipmi-sel: unable to get SEL record > > > > > > After the above, the box automatically rebooted. Is this normal? > > > > I have never seen this behavior before, and I wouldn't consider it > > "good" in any definition. This is likely a bug in the IBM > > implementation. The "BMC busy" means exactly what it says, the BMC is > > busy and cannot respond to IPMI requests. It by itself is not a > > problem. For example, some other IPMI tasks are hogging resources. But > > you should presumably be able to reach the card eventually. Is it > > possible you have other IPMI things running in the background? > > > > bmc-watchdog (as daemon) was the only thing running in the background. This shouldn't be enough to cause enough IPMI to be *that* busy. Here's a thought. Perhaps the ipmi-sel logs went full, the BMC card went busy, and thus the bmc-watchdog couldn't perform IPMI and timed out, thus leading to a reboot?? Obviously, it depends on how you setup the bmc-watchdog. > > > > I then cleared the SEL records, thinking that the reboot might have been > > triggered due to a full SEL. > > > > I think this is a reasonable guess. It could be anything really. > > > > > # ipmi-sel -c > > > > > > # reboot > > > # ipmi-sel > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > # ipmi-sel > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > > > # reboot > > > # ipmi-sel > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00 > > > > > > Then retried the previous command that caused an error. > > > > > > # ipmi-sel > ibm3650-dsc2075-sel.txt > > > > > > # cat ibm3650-dsc2075-sel.txt > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00 > > > > > > Then the problem didn't occur anymore. > > > Besides, what is the meaning of this OEM defined? I can't see any log > > > that is > > > more specific, or something like > > > > The system event log is allowed to store OEM defined information. Since > > the information is defined by (in this case) IBM, I have no way to > > convert the hex into something like what you're used to :-( > > > > I think this is cool. So, is it safe to assume that the system > rebooted if I see similar OEM defined info ( in this case OEM defined > = 00 00 00 00 00 E3 25 86 80 00 00 FF 00)? Is there any possibility to > integrate IBM's OEM defined info in the future too? :D I'd be willing to integrate any vendors OEM defined interpretation/parsing into FreeIPMI. The problem is, I do not know how to interpret/parse any of their information :-( As a customer, you should tell your vendor support about this. Each user that complains makes it more possible for them to release the information. Al > > > 220:19-Sep-2008 14:24:56:Power Unit Sys pwr monitor:Power Off/Power Down > > > 221:19-Sep-2008 14:25:16:Power Unit Sys pwr monitor:Power Off/Power Down > > > > > > I've attached here the ipmi-sel debug output. > > > > > > Then one side question, I want to ask the possible reasons of the ff > > > log obtained prior to clearing. I didn't change any in the system. > > > I just noticed that the system halted serving and went back after 4-5 > > > minutes, w/out any other records in SEL that says the box hang and > > > rebooted. > > > > > > 54:23-Jan-2009 11:28:55:System Event #0:System Reconfigured > > > > I'm not quite sure what you're asking. Are you asking why the above log > > message occurs? I'm not too sure. It could really be for one of many > > reasons. Maybe a BIOS changed for a firmware changed? The IPMI spec > > doesn't really define when a "System Reconfigured" event must be > > reported. It only defines that a "System Reconfigured" event can occur > > and that manufacturers are free to determine what events will make that > > information output to the event log. > > > > You exactly got what I should mean. But aside from changes on the BIOS > or BMC firmware, I want to know too if there are instances that the > event would be reported if there are changes on the OS level. I just > wondered why the "System Reconfigured" event log came out, where in > fact no changes were made on the BIOS firmware or BMC firmware, or on > the OS level. Sorry, this question may not be related to FreeIPMI > anymore, but I just want to elicit some ideas from you. > > > Hope I was helpful, > > > > Al > > > > > Thanks, > > > > > > Won > > > > > > > > > > > -- > > Albert Chu > > ch...@llnl.gov > > Computer Scientist > > High Performance Systems Division > > Lawrence Livermore National Laboratory > > I am receiving mail delivery error(s) when sending mails to > freeipmi-us...@gnu.org; freeipmi-de...@gnu.org. > > Thanks for the usual support and help, > > Won > > > > > -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-devel mailing list Freeipmi-devel@gnu.org http://lists.gnu.org/mailman/listinfo/freeipmi-devel