You should check for updated on-board NIC firmware as well as upgrading
your tg3 driver to the latest version.

Your problem is almost certainly an issue where the tg3 driver stops
picking up traffic from the broadcom buffers.  When the buffers fill
without a driver request to flush them, the NIC's passthrough to the BMC
will begin to fail.

Broadcom added in their firmware and driver a NIC watchdog, where the
driver notifies the firmware that the driver is loaded and will
heartbeat ever so often.  If the kernel panics and/or tg3 goes out to
lunch, the firmware will see the missed heartbeats and change the NIC to
operate as if no driver is loaded to maintain passthrough.

Another workaround if this isn't working or is not palatable or workable
is to set up the BMC watchdog timer, so that if the kernel panics, the
system will either turn off or reset so you preserve pass-through to the
BMC.

I know on IBM systems we've paid a lot of attention to this and
understand very well how to deal with kernel panics and the like with
respect to the systems management, but it unfortunately requires that if
you use a NIC driver (almost certainly you would want to), it must
implement this watchdog facility, so you need newer tg3.  The BMC
watchdog is the solution that will overcome the situation regardless of
the NIC driver loaded prior to a panic.  We use the same bcm5721 chips
on x336 and x346 and I have positively confirmed that given the right
firmware/driver combination, panic of a system does not knock out system
access.

Look for FWCMD_NICDRV_ALIVE somewhere in your tg3, and it should have
something near it like  /* Heartbeat is only sent once every 120
seconds.  */.  If so the tg3 has the bit needed to check in with the
watchdog, just make sure your vendor gives you the latest NIC firmware
to go with it.  If your vendor is unable to provide you with NIC
firmware to help you overcome this, buy from IBM and fund my salary ;)

 

On Wed, 2006-10-04 at 15:28 +0100, Colin Keith wrote:
> Hi,
> 
> I'm still trying to work out my IPMI problems and I was wondering if anyone
> could make suggestions on the following. I have a server out at a data
> center its a dual AMD 254 Operton box on a Tyan m/b with Broadcom NetXtreme
> BCM5721 nics (using tg3 driver). It was shipped from Penguin (and since I
> see them on this list occassionally I assume that they tested all of this
> stuff and know it works :).
> 
> I can access the box via IPMI over the LAN without problems when it is up
> and running, but for some reason it is falling down every so few weeks
> (running FC5 and it seems to be a problem with the aacraid driver in 2.6.17
> ..). When it does I can't get any commands through at all, including a
> reset, which was our entire reason for getting the IPMI support on the box.
> 
> # ipmitool -I open bmc info
> Device ID                 : 32
> Device Revision           : 1
> Firmware Revision         : 1.6
> IPMI Version              : 2.0
> Manufacturer ID           : 20569
> Manufacturer Name         : Unknown (0x5059)
> Product ID                : 17 (0x0011)
> Device Available          : yes
> Provides Device SDRs      : yes
> Additional Device Support :
>     Sensor Device
>     SDR Repository Device
>     SEL Device
>     FRU Inventory Device
>     IPMB Event Receiver
>     IPMB Event Generator
> Aux Firmware Rev Info     :
>     0x00
>     0x00
>     0x00
>     0x00
> 
> LAN channel:
> 
> # ipmitool -I open channel info 6
> Channel 0x6 info:
>   Channel Medium Type   : 802.3 LAN
>   Channel Protocol Type : IPMB-1.0
>   Session Support       : multi-session
>   Active Session Count  : 0
>   Protocol Vendor ID    : 7154
>   Volatile(active) Settings
>     Alerting            : enabled
>     Per-message Auth    : enabled
>     User Level Auth     : enabled
>     Access Mode         : always available
>   Non-Volatile Settings
>     Alerting            : enabled
>     Per-message Auth    : enabled
>     User Level Auth     : enabled
>     Access Mode         : always available
> 
> # ipmitool -I open lan print 6
> Set in Progress         : Set Complete
> Auth Type Support       : NONE MD2 MD5 PASSWORD
> Auth Type Enable        : Callback : NONE MD2 MD5 PASSWORD
>                         : User     : NONE MD2 MD5 PASSWORD
>                         : Operator : NONE MD2 MD5 PASSWORD
>                         : Admin    : NONE MD2 MD5 PASSWORD
>                         : OEM      : NONE MD2 MD5 PASSWORD
> IP Address Source       : Static Address
> IP Address              : 38.x.y.z
> Subnet Mask             : 255.255.255.224
> MAC Address             : 00:a0:d1:e1:e6:24
> SNMP Community String   : XXXXXXX
> IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
> BMC ARP Control         : ARP Responses Enabled, Gratuitous ARP Enabled
> Gratituous ARP Intrvl   : 2.0 seconds
> Default Gateway IP      : 38.x.y.z
> Default Gateway MAC     : 00:16:46:7d:f6:00
> Backup Gateway IP       : 38.x.y.z
> Backup Gateway MAC      : 00:03:47:84:c4:3f
> 802.1q VLAN ID          : Disabled
> 802.1q VLAN Priority    : 0
> RMCP+ Cipher Suites     : 0,1
> Cipher Suite Priv Max   : Not Available
> 
> # ipmitool -H XXXX -I lanplus -U XXXXX -f /root/.ipmipass -C AES-CBC-128
> # user list 6
> ID  Name             Callin  Link Auth  IPMI Msg   Channel Priv Limit
> 1                    true    false      true       ADMINISTRATOR
> 2   XXXXX            true    false      true       ADMINISTRATOR
> 3   QRSTUVWXYZ123456 true    false      true       ADMINISTRATOR
> 4   7890-=&*()_+     true    false      true       ADMINISTRATOR
> 
> 
> While it was down I would send a command from a box on the same LAN
> segment and get:
> 
> # ipmitool -H XXX -I lan -U XXXXX -f /root/.ipmipass -C AES-CBC-128 -vv # 
> chassis status
> ipmi_lan_send_cmd:opened=[0], open=[134767402]
> IPMI LAN host apollo port 623
> Sending IPMI/RMCP presence ping packet
> ipmi_lan_send_cmd:opened=[1], open=[134767402]
>   No response from remote controller
> Get Auth Capabilities command failed
> ipmi_lan_send_cmd:opened=[1], open=[134767402]
> 
> # ipmitool -V
> ipmitool version 1.8.8
> 
> (both boxes)
> 
> I have the BMC set to send gratuitous ARP's every 2s. I now have another
> box out there too and I can see the IPMI enabled server sending these
> packets out, even when it was down. I had the ARP info hard coded on the
> other box to ensure that wasn't a problem too, but I wasn't able to get
> responses back from the box. As there was no response from the remote
> server it seems most likely that it wasn't picking up the data. It is,
> I suppose, possible that it was processing the request and responding and
> for some reason those packets didn't reach my other server because the
> responses contained invalid info, but I saw no evidence of any other
> traffic on the box I was sending the commands from so I don't think that
> this is likely - plus it responds without problem when the OS is running.
> 
> Questions:
> 
> I previously had the "IP Source Address" no the LAN channel set to the
> default of "Unspecified". I've since changed it to "static". Would this
> have made a difference?
> 
> I share the BMC IP with the server's IP. This isn't a problem when the OS
> is running or when the box is plugged in but not turned on. Could this be
> a problem if the OS crashes though?
> 
> Does anyone know, from the manufacturer ID, who/where I should be looking
> for firmware updates? Also if anyone has this same manufacturer, do you use
> any of the special work arounds? (the "-o" switch in the ipmitool command)
> Could you post your config/say if you have this problem as well or not.
> 
> While I'm at it, does anyone have a list of the hex codes for the raw
> commands?
> 
> I can't for the life of me work out why IPMI works if the box is up and
> running the OS, or if it is powered down and it can then be powered up
> again, but if FC5 crashes then I can't get any response from the IPMI
> controller.
> 
> Thanks in advance for any suggestions.
> 
> Colin.
> 
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to