Rahul,

Yes, that is a key function that all IPMI BMCs are supposed to provide.
The BMC is generally not affected by what the OS does, unless there are
IPMI-aware applications running in the OS, specifically talking to the
BMC.  
There are a few possible explanations for this behavior:

1) IPMI LAN configuration.  Make sure that the IPMI LAN was properly
configured.  It sounds like you may have tested this beforehand.  Even
something like the ARP configuration could cause the port to no longer
be visible to the router.  

2) Basic BMC LAN functionality.  The verbose output doesn't show
ipmitool sending an RMCP ping, which it should attempt, unless you
specified OEM Supermicro.  Some other tools have specific method to
invoke the RMCP ping, if you wish.  Ipmiutil has 'ipmiutil discover' and
freeipmi has 'rmcpping'.  Or, if it is a dedicated BMC NIC, it may even
answer normal pings.  This would determine if the BMC is responding to
the LAN port at all or not.  Most likely it is not, since the BMC did
not respond to the GetChannelAuthCap command in the output below.  

3) Some OS-resident (custom?) IPMI-aware application that may be causing
trouble/stress/configuration problems with the BMC.  In a healthy
system, the 'ps -ef' output on the target should show any ipmi-related
processes that are running.  The most common problem I've seen is
sometimes these apps may use SetSelTime frequently to change the BMC's
timestamp.  This can cause serious confusion for the BMC, since this is
not intended to be used more than once per boot, or daily (similar to
NTP usage), and especially if the BMC's timestamp gets set backward when
something else is in process.  

4) A bug in the BMC.  You didn't mention which vendor's IPMI BMC is
being used, but from the To list, it might be Dell (?).  Get the BMC
version number and find out if there is an upgrade from the vendor.
That is more important than what ipmitool does.  If the BMC is in a bad
state, the history from the IPMI SEL may be helpful to the vendor.  If
it is reproducible after an upgrade, the vendor should be able to fix
it.  

Andy

-----Original Message-----
From: Rahul Nabar [mailto:rpna...@gmail.com] 
Sent: Sunday, August 29, 2010 5:17 PM
To: linux-powere...@dell.com; ipmitool-devel@lists.sourceforge.net
Subject: [Ipmitool-devel] Is the BMC robust to recover from system
hangs?impitool unresponsive

I typically use out-of-band ipmitool to reboot machines that might
once in a while be unreachable via ssh remotely because something went
wrong.

Earlier today I was running a new, challenging parallel job over 32
servers and something went wrong. I suspect the nodes ran out of
memory and after that a bunch of nodes became unresponsive. On some I
was able to use impitool to reboot:

/usr/bin/ipmitool -f ~/ipmi_pw -I lanplus -U root -H 172.16.0.13 power
cycle

But a bunch of them the BMC just doesn't respond to ipmitool?

/usr/bin/ipmitool -f ~/ipmi_pw -I lanplus -U root -H 172.16.0.13 power
status
just hangs.

Isn't this the whole point behind the BMC the ability to be able to
connect and recover out-of-band? Even if (worst case) my kernel
panicked and the tcp stack collapsed shouldn't ipmi still be able to
talk to the BMC? I had checked that before the job crashed the nodes
the BMCs were working and responsive to IPMI.

Am I doing something wrong here or is this non-robustness a documented
shortcoming of the BMC's? Any comments for others using BMC's / IPMI
are very welcome!

Verbose:
no clues

More verbose:
IPMI LAN host 172.16.0.14 port 623

>> Sending IPMI command payload
>>    netfn   : 0x06
>>    command : 0x38
>>    data    : 0x8e 0x04


>> Sending IPMI command payload
>>    netfn   : 0x06
>>    command : 0x38
>>    data    : 0x8e 0x04

Most verbose:
IPMI LAN host 172.16.0.14 port 623

>> Sending IPMI command payload
>>    netfn   : 0x06
>>    command : 0x38
>>    data    : 0x8e 0x04

BUILDING A v1.5 COMMAND
>> IPMI Request Session Header
>>   Authtype   : NONE
>>   Sequence   : 0x00000000
>>   Session ID : 0x00000000
>> IPMI Request Message Header
>>   Rs Addr    : 20
>>   NetFn      : 06
>>   Rs LUN     : 0
>>   Rq Addr    : 81
>>   Rq Seq     : 00
>>   Rq Lun     : 0
>>   Command    : 38

Are there any more options I can pass to ipmitool to increase
verbosity and know at what exact point the hang is occuring?



-- 
Rahul

------------------------------------------------------------------------
------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users

worldwide. Take advantage of special opportunities to increase revenue
and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to