----- Original Message ---- > From: Al Chu <[email protected]> > > Hey Won, > > On Mon, 2009-01-26 at 18:53 -0800, Won De Erick wrote: > > ----- Original Message ---- > > > > > From: Al Chu > > > > > > Hey Won, > > > > > > On Sun, 2009-01-25 at 23:00 -0800, Won De Erick wrote: > > > > I am forwarding this to the FreeIPMI users mailing list. Hope, I can > > > > get > > > > hints from you all. > > > > Thank you. > > > > > > > > > > > > > > > > ----- Forwarded Message ---- > > > > From: Won De Erick > > > > To: Albert Chu > > > > Cc: [email protected] > > > > Sent: Saturday, January 24, 2009 11:55:24 AM > > > > Subject: Re: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable > > > > to > > > > get SEL record > > > > > > > > Pls disregard previous email. I forgot to attach the file. :) > > > > > > Did you send me the wrong debug file? I see debug output from > > > ipmi-sensors?? > > > > > > > I'm sorry, attached is the correct one. > > Seems that this has a successful ipmi-sel execution in it. So not much > I can debug with :-( > > > > > > > Hi Al, > > > > > > > > With IBM x3650, I noticed that ipmi-sel is unable to get the SEL > > > > record. > > > > > > > > # ipmi-sel --version > > > > IPMI Sensors [ipmi-sel-0.6.10] > > > > > > > > # ipmi-sel > ibm3650-dsc2075-sel.txt > > > > ipmi_cmd_get_sel_entry: BMC busy > > > > ipmi-sel: unable to get SEL record > > > > > > > > After the above, the box automatically rebooted. Is this normal? > > > > > > I have never seen this behavior before, and I wouldn't consider it > > > "good" in any definition. This is likely a bug in the IBM > > > implementation. The "BMC busy" means exactly what it says, the BMC is > > > busy and cannot respond to IPMI requests. It by itself is not a > > > problem. For example, some other IPMI tasks are hogging resources. But > > > you should presumably be able to reach the card eventually. Is it > > > possible you have other IPMI things running in the background? > > > > > > > bmc-watchdog (as daemon) was the only thing running in the background. > > This shouldn't be enough to cause enough IPMI to be *that* busy. Here's > a thought. Perhaps the ipmi-sel logs went full, the BMC card went busy, > and thus the bmc-watchdog couldn't perform IPMI and timed out, thus > leading to a reboot?? Obviously, it depends on how you setup the > bmc-watchdog. > this is my setup: #bmc-watchdog -d -u 4 -p 0 -n -i 300 -l 0
I forgot to tell you that I am using in-band mechanism. IBM x3650 should be installed with an RSA II card to get BMC card (think this is the built-in LAN management port that goes with the box) working. > > > > > > I then cleared the SEL records, thinking that the reboot might have > > > > been > > > > triggered due to a full SEL. > > > > > > I think this is a reasonable guess. It could be anything really. > > > > > > > # ipmi-sel -c > > > > > > > > # reboot > > > > # ipmi-sel > > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > # ipmi-sel > > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > > > > > # reboot > > > > # ipmi-sel > > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00 > > > > > > > > Then retried the previous command that caused an error. > > > > > > > > # ipmi-sel > ibm3650-dsc2075-sel.txt > > > > > > > > # cat ibm3650-dsc2075-sel.txt > > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00 > > > > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00 > > > > > > > > Then the problem didn't occur anymore. > > > > Besides, what is the meaning of this OEM defined? I can't see any log > > > > that > > > > is more specific, or something like > > > > > > The system event log is allowed to store OEM defined information. Since > > > the information is defined by (in this case) IBM, I have no way to > > > convert the hex into something like what you're used to :-( > > > > > > > I think this is cool. So, is it safe to assume that the system > > rebooted if I see similar OEM defined info ( in this case OEM defined > > = 00 00 00 00 00 E3 25 86 80 00 00 FF 00)? Is there any possibility to > > integrate IBM's OEM defined info in the future too? :D > > I'd be willing to integrate any vendors OEM defined This is nice to know. :) > interpretation/parsing into FreeIPMI. The problem is, I do not know how > to interpret/parse any of their information :-( > > As a customer, you should tell your vendor support about this. Each > user that complains makes it more possible for them to release the > information. > > Al > > > > > 220:19-Sep-2008 14:24:56:Power Unit Sys pwr monitor:Power Off/Power Down > > > > 221:19-Sep-2008 14:25:16:Power Unit Sys pwr monitor:Power Off/Power Down > > > > > > > > I've attached here the ipmi-sel debug output. > > > > > > > > Then one side question, I want to ask the possible reasons of the ff > > > > log obtained prior to clearing. I didn't change any in the system. > > > > I just noticed that the system halted serving and went back after 4-5 > > > > minutes, w/out any other records in SEL that says the box hang and > > > > rebooted. > > > > > > > > 54:23-Jan-2009 11:28:55:System Event #0:System Reconfigured > > > > > > I'm not quite sure what you're asking. Are you asking why the above log > > > message occurs? I'm not too sure. It could really be for one of many > > > reasons. Maybe a BIOS changed for a firmware changed? The IPMI spec > > > doesn't really define when a "System Reconfigured" event must be > > > reported. It only defines that a "System Reconfigured" event can occur > > > and that manufacturers are free to determine what events will make that > > > information output to the event log. > > > > > > > You exactly got what I should mean. But aside from changes on the BIOS > > or BMC firmware, I want to know too if there are instances that the > > event would be reported if there are changes on the OS level. I just > > wondered why the "System Reconfigured" event log came out, where in > > fact no changes were made on the BIOS firmware or BMC firmware, or on > > the OS level. Sorry, this question may not be related to FreeIPMI > > anymore, but I just want to elicit some ideas from you. > > > > > Hope I was helpful, > > > > > > Al > > > > > > > Thanks, > > > > > > > > Won > > > > > > > > > > > > > > > -- > > > Albert Chu > > > [email protected] > > > Computer Scientist > > > High Performance Systems Division > > > Lawrence Livermore National Laboratory > > > > I am receiving mail delivery error(s) when sending mails to > [email protected]; [email protected]. > > > > Thanks for the usual support and help, > > > > Won > > > -- > Albert Chu > [email protected] > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-devel
