Thanks Jeff.
For the records, that can be helpful for others. Maybe it is on the
mailing list, but when I last time searched for ipmi errors I couldn't
find any clue.
If you see something like that (see below) in the logs (this is from
CentOS 5.3), that means IPMI is not working properly, basically
Dell OpenManage is not working properly.
After few weeks with those IPMI errors machine will freeze (kernel
2.6.18). It's really annoying, as on DRAC power options are
greyed-out[#ref1] when IPMI timeout errors are occurring. The only way
to bring the box back to life after freeze is to physically power cycle
it. Remotely it is not possible to do that (magic sysrq can help, but
not always). That still doesn't help for IPMI errors. After reboot IPMI
errors are back straight away. To get rid of those errors (for a short
while, about a week) you need to unplug the server from power, wait 20
seconds, and power it back. But they will come back. Maybe a week, maybe
two weeks, but you will see them again. After few weeks (months) server
will start giving OOM-killer errors, and machine will freeze again.
A while ago I've updated the BMC firmware from 1.73 to 1.84 on two
testing machines which had those errors, and so far so good. No issues,
timeouts are gone. Sounds simple, firmware update, but took me some time
to figure that out.
Here are some examples of those IPMI errors:
IPMI BT: timeout in XACTION [ B_BUSY H2B ] failed 2 retries, sending error
response
IPMI: BT reset (takes 5 secs)
IPMI BT: timeout in XACTION [ B_BUSY H2B ] failed 2 retries, sending error
response
IPMI: BT reset (takes 5 secs)
IPMI BT: timeout in XACTION [ B_BUSY H2B ] failed 2 retries, sending error
response
IPMI: BT reset (takes 5 secs)
IPMI BT: timeout in XACTION [ B_BUSY H2B ] failed 2 retries, sending error
response
IPMI: BT reset (takes 5 secs)
IPMI BT: timeout in XACTION [ B_BUSY H2B ] failed 2 retries, sending error
response
IPMI: BT reset (takes 5 secs)
Here is my old post, this is the same issue:
http://marc.info/?t=122409170300010&r=1&w=2
Regards,
Mikolaj
References
1. http://img135.imageshack.us/img135/7439/screenshot2lo7.jpg
On Fri, Oct 30, 2009 at 09:45:46AM -0500, [email protected] wrote:
> I believe if you go to support.dell.com you can see the changes for each
> release, not exactly sure how. Here is what I found:
>
> Jul 8 2003 10:59AM
> BMC v1.73
> - Maintenance
> GBP v1.01
> - No Change.
>
>
> Oct 17 2003 5:16PM
> BMC v1.80
> - Fan stepping adjusted to reduce system noise levels.
> GBP v1.01
> - No Change.
>
>
> Dec 1 2004 3:06PM
> BMC v1.84
> - - Modified SEL thread to send a response for the SEL clear command
> without putting it into the IPMI thread queue.
> GBP v1.01
> - No Change.
>
>
> -Jeff
>
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of
> > Mikolaj Kucharski
> > Sent: Friday, October 30, 2009 5:03 AM
> > To: linux-poweredge-Lists
> > Subject: dell esm firmware changelog
> >
> > Hi,
> >
> > Not strictly related to Linux, but I'm tracking a bug which
> > hits my Linux fleet and I think that can be related to ESM
> > firmware (PE1750).
> >
> > Can anyone advise where I can find changelog for PE1750 ESM
> > firmware from 1.73 to 1.84? Full changelog would be even better.
--
best regards
q#
_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq