Don't know much about Dell specifically, however I'll offer some guidance.

If the Broadcom part has the tg3 driver, you may be out of luck depending
on the failure state.  For example, BCM5704 chips fundamentally cannot
provide BMC access while executing PXE.  On the other hand, bnx2 managed
chips tend to fare better, there generally is at least one way to make it
work correctly, though drivers and nic firmware matter *greatly* still.
Not as resilient as I would like, but with precautions in how you manage
firmware and drivers, it's workable.

You'll want to check your tg3/bnx2/whatever driver version and NIC firmware
version, depending on your investigation.  Shared nics can work great, but
some implementations can be picky about what drivers and firmware are in
place. Also, newer is not always better, sometimes a developer without
caring about the IPMI access provided by some nics will unwittingly break
it somehow in the driver, and it won't get fixed until some server vendor
or other industrious administrator stumbles across it.

                                                                       
  From:       Rahul Nabar <rpna...@gmail.com>                          
                                                                       
  To:         Jarrod B Johnson/Raleigh/i...@ibmus                       
                                                                       
  Cc:         ipmitool-devel@lists.sourceforge.net, linux-powere...@dell.com
                                                                       
  Date:       08/30/2010 10:52 AM                                      
                                                                       
  Subject:    Re: [Ipmitool-devel] Is the BMC robust to recover from system 
hangs? impitool unresponsive
                                                                       







On Mon, Aug 30, 2010 at 7:52 AM, Jarrod B Johnson <jbjoh...@us.ibm.com>
wrote:
  Your BMC simply isn't responding to any traffic. BMCs are supposed to be
  completely resilient to OS failures when done properly (not much apart
  from things like power failures in non-redundant systems should be
  capable of knocking out a quality IPMI implementation) . You need to look
  to your system vendor's support for an explanation and/or resolution,
  since implementations vary greatly from one vendor to the next. Sometimes
  a vendor is not competent to make it work, sometimes a vendor is too
  cheap to make it easy, and sometimes a vendor simply hasn't covered your
  particular NIC driver/OS combination and the NIC vendor flubbed some
  register handling or some such to make the NIC shoot itself when the
  kernel panics.



Thanks for the tips Jarrod! I will look into the nodes. These are
DellR410-servers with the on-board Broadcom NIC. The first thing for this
Monday morning is for me to trudge down to the dark depths of the cluster
room and to manually log in and see what exactly happened to these nodes.

I'll post on the list if I find anything interesting

--
Rahul

<<inline: graycol.gif>>

<<inline: ecblank.gif>>

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to