Your BMC simply isn't responding to any traffic.  BMCs are supposed to be
completely resilient to OS failures when done properly (not much apart from
things like power failures in non-redundant systems should be capable of
knocking out a quality IPMI implementation) .  You need to look to your
system vendor's support for an explanation and/or resolution, since
implementations vary greatly from one vendor to the next.  Sometimes a
vendor is not competent to make it work, sometimes a vendor is too cheap to
make it easy, and sometimes a vendor simply hasn't covered your particular
NIC driver/OS combination and the NIC vendor flubbed some register handling
or some such to make the NIC shoot itself when the kernel panics.



                                                                       
  From:       Rahul Nabar <rpna...@gmail.com>                          
                                                                       
  To:         linux-powere...@dell.com, ipmitool-devel@lists.sourceforge.net
                                                                       
  Date:       08/29/2010 05:18 PM                                      
                                                                       
  Subject:    [Ipmitool-devel] Is the BMC robust to recover from system hangs?  
impitool unresponsive
                                                                       





I typically use out-of-band ipmitool to reboot machines that might
once in a while be unreachable via ssh remotely because something went
wrong.

Earlier today I was running a new, challenging parallel job over 32
servers and something went wrong. I suspect the nodes ran out of
memory and after that a bunch of nodes became unresponsive. On some I
was able to use impitool to reboot:

/usr/bin/ipmitool -f ~/ipmi_pw -I lanplus -U root -H 172.16.0.13 power
cycle

But a bunch of them the BMC just doesn't respond to ipmitool?

/usr/bin/ipmitool -f ~/ipmi_pw -I lanplus -U root -H 172.16.0.13 power
status
just hangs.

Isn't this the whole point behind the BMC the ability to be able to
connect and recover out-of-band? Even if (worst case) my kernel
panicked and the tcp stack collapsed shouldn't ipmi still be able to
talk to the BMC? I had checked that before the job crashed the nodes
the BMCs were working and responsive to IPMI.

Am I doing something wrong here or is this non-robustness a documented
shortcoming of the BMC's? Any comments for others using BMC's / IPMI
are very welcome!

Verbose:
no clues

More verbose:
IPMI LAN host 172.16.0.14 port 623

>> Sending IPMI command payload
>>    netfn   : 0x06
>>    command : 0x38
>>    data    : 0x8e 0x04


>> Sending IPMI command payload
>>    netfn   : 0x06
>>    command : 0x38
>>    data    : 0x8e 0x04

Most verbose:
IPMI LAN host 172.16.0.14 port 623

>> Sending IPMI command payload
>>    netfn   : 0x06
>>    command : 0x38
>>    data    : 0x8e 0x04

BUILDING A v1.5 COMMAND
>> IPMI Request Session Header
>>   Authtype   : NONE
>>   Sequence   : 0x00000000
>>   Session ID : 0x00000000
>> IPMI Request Message Header
>>   Rs Addr    : 20
>>   NetFn      : 06
>>   Rs LUN     : 0
>>   Rq Addr    : 81
>>   Rq Seq     : 00
>>   Rq Lun     : 0
>>   Command    : 38

Are there any more options I can pass to ipmitool to increase
verbosity and know at what exact point the hang is occuring?



--
Rahul

------------------------------------------------------------------------------

Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users
worldwide. Take advantage of special opportunities to increase revenue and
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

<<inline: graycol.gif>>

<<inline: ecblank.gif>>

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to