>> Q: Is it normal for BMC to not be available, including checking status) >> for some period of time after a power transition? > > Depends frequently on your switch configuration. A system reset/on/off > means link renogtiation. If your switch has LACP enabled for the port > or spanning tree is enabled and not edge-port or portfast (depending on > your switch vendor's terminalogy), it will take time. 15 seconds sounds > a lot like typical LACP link delay on some switches. It could be > spanning-tree delay, but 15 seconds is a bit on the short side for such > a delay.
Very good info. Currently using a Cisco 2950. I will try a simpler unmanaged switch and try to measure the difference. > Also, it is not uncommon for the interface to operate at > different speeds when off and on. Have had more than one customer try > to hard-set their switch ports to gigabit speed, then call wondering why > the BMCs were unreachable in the off state. In standby power situation, > 100Mbit link will often be used rather than gigabit. This shouldn't be > your problem, but just a warning not to attempt hard-setting the link > speed to shorten negotiation (unless only 100Mbit is fine by you, then > you can try it). Excellent insight/advice. > Some poor switches could indeed incur a significant > penalty for mere link speed negotiation. >> Q: Is there a specification that defines how long this period of time >> should be? > > No, well, maybe if you dug into the various relevant protocols your > switch implements you might find something, but as far as IPMI is > concerned, there is nothing on this. OK >> Q: If not, what kinds of times are typical? > > Optimally, no more than 2-3 seconds. With switches configured for LACP > and/or spanning tree, could block port from forwarding packets for up to > 60 seconds or so. Some switches will cause longer outages no matter > what. Understood. >> Q: Does it vary from implementation to implementation? >> > > Probably, though it probably has more to do with the switching equipment > hooked up and configured, and somewhat to do with what NIC chip is used > in the server and how the server's NIC is set up in firmware. OK >> I wrote a command line and that asked for 'power chassis status' and >> slept >> for 2 seconds. I observe that there are times during OS driver >> initialization when it times out. >> >> Presumably this happens because the driver is interfering with the NIC >> and >> blocking BMC access. >> >> Q: What are 'typical' times for this blockage when the NIC driver >> initializes? >> > > See answer above since that event also evokes a link renegotiation, > though this is potentially more complicated. You could sit on local > console, restart the networking service (best if system has static > rather than dhcp config), and leave a ping running to see how many > packets get lost during the restart. That will measure delay due to > switch not negotiating the link/forwarding packets. If it matches the > BMC outage (which it probably will), you will know any optimization to > be had is in the switch to NIC interaction. Good idea. > If shorter than BMC outage, keep in mind that some NICs on driver load > and PXE rom execution will have resources that are required be tied up > for some period of time. For example, the Broadcom 5704 chip will not > be able to forward IPMI traffic and PXE at the same time, though newer > chips (NetXtreme II family and later NetXtreme nics like used in IBM > e326m and JS-21) can. However, at least with broadcom NICs, driver load > time outage is pretty much on the order of the same delay incurred at > power transition times, in other words potentially neglibile. If 15 > seconds in all scenarios, I'd look at your switchport config or ask your > network administrator to do so and as is feasible shorten STP delays and > disable LACP if enabled unless you are using it. > > Some other cases to be wary of is to panic the kernel while the system > NIC is up, and to chassis power off without proper shutdown. Keep > traffic up to the BMC, maybe an amount of broadcast traffic on the > network and see if the BMC will continue to work indefinitely. If not > investigate the possiblity of network driver and NIC firmware upgrades, > since the inability to access your BMC after a panic kinda puts a damper > on the whole manageability thing. Alternatively, since a panicked/hung > system is not something you want to leave in a production environment > anyway, consider using some watchdog program with your BMC. If linux, > OpenIPMI drivers included in most distros (just a 'modprobe > ipmi_si;modprobe ipmi_watchdog' away) and any standard linux watchdog > program (i.e. http://www.ibiblio.org/pub/Linux/system/daemons/watchdog/) > and it will auto-reset (or auto-off) your system in the event of a hang. > When setting this up I set up the BMC watchdog timeout to be a bit over > 3 times the watchdog checkin interval, because I'm paranoid. My typical > config is watchdog will fire after 60 seconds, and the daemon checks in > every 15 seconds to BMC. I wrote an even simpler watchdog than what I > linked to, but it is nearly the simplest possible with even fewer > features. All excellent advice. Enough for me to chew on for a while. Thank you. Miguel ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Ipmitool-devel mailing list Ipmitool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ipmitool-devel