>> Q: Is it normal for BMC to not be available, including checking status)
>> for some period of time after a power transition?
>
> Depends frequently on your switch configuration.  A system reset/on/off
> means link renogtiation.  If your switch has LACP enabled for the port
> or spanning tree is enabled and not edge-port or portfast (depending on
> your switch vendor's terminalogy), it will take time.  15 seconds sounds
> a lot like typical LACP link delay on some switches.  It could be
> spanning-tree delay, but 15 seconds is a bit on the short side for such
> a delay.

Very good info. Currently using a Cisco 2950. I will try a simpler
unmanaged switch and try to measure the difference.

> Also, it is not uncommon for the interface to operate at
> different speeds when off and on.  Have had more than one customer try
> to hard-set their switch ports to gigabit speed, then call wondering why
> the BMCs were unreachable in the off state.  In standby power situation,
> 100Mbit link will often be used rather than gigabit.  This shouldn't be
> your problem, but just a warning not to attempt hard-setting the link
> speed to shorten negotiation (unless only 100Mbit is fine by you, then
> you can try it).

Excellent insight/advice.

> Some poor switches could indeed incur a significant
> penalty for mere link speed negotiation.

>> Q: Is there a specification that defines how long this period of time
>> should be?
>
> No, well, maybe if you dug into the various relevant protocols your
> switch implements you might find something, but as far as IPMI is
> concerned, there is nothing on this.

OK

>> Q: If not, what kinds of times are typical?
>
> Optimally, no more than 2-3 seconds.  With switches configured for LACP
> and/or spanning tree, could block port from forwarding packets for up to
> 60 seconds or so.  Some switches will cause longer outages no matter
> what.

Understood.

>> Q: Does it vary from implementation to implementation?
>>
>
> Probably, though it probably has more to do with the switching equipment
> hooked up and configured, and somewhat to do with what NIC chip is used
> in the server and how the server's NIC is set up in firmware.

OK

>> I wrote a command line and that asked for 'power chassis status' and
>> slept
>> for 2 seconds. I observe that there are times during OS driver
>> initialization when it times out.
>>
>> Presumably this happens because the driver is interfering with the NIC
>> and
>> blocking BMC access.
>>
>> Q: What are 'typical' times for this blockage when the NIC driver
>> initializes?
>>
>
> See answer above since that event also evokes a link renegotiation,
> though this is potentially more complicated.  You could sit on local
> console, restart the networking service (best if system has static
> rather than dhcp config), and leave a ping running to see how many
> packets get lost during the restart.  That will measure delay due to
> switch not negotiating the link/forwarding packets.  If it matches the
> BMC outage (which it probably will), you will know any optimization to
> be had is in the switch to NIC interaction.

Good idea.

> If shorter than BMC outage, keep in mind that some NICs on driver load
> and PXE rom execution will have resources that are required be tied up
> for some period of time.  For example, the Broadcom 5704 chip will not
> be able to forward IPMI traffic and PXE at the same time, though newer
> chips (NetXtreme II family  and later NetXtreme nics like used in IBM
> e326m and JS-21) can.  However, at least with broadcom NICs, driver load
> time outage is pretty much on the order of the same delay incurred at
> power transition times, in other words potentially neglibile.  If 15
> seconds in all scenarios, I'd look at your switchport config or ask your
> network administrator to do so and as is feasible shorten STP delays and
> disable LACP if enabled unless you are using it.
>
> Some other cases to be wary of is to panic the kernel while the system
> NIC is up, and to chassis power off without proper shutdown.  Keep
> traffic up to the BMC, maybe an amount of broadcast traffic on the
> network and see if the BMC will continue to work indefinitely.  If not
> investigate the possiblity of network driver and NIC firmware upgrades,
> since the inability to access your BMC after a panic kinda puts a damper
> on the whole manageability thing.   Alternatively, since a panicked/hung
> system is not something you want to leave in a production environment
> anyway, consider using some watchdog program with your BMC.  If linux,
> OpenIPMI drivers included in most distros (just a 'modprobe
> ipmi_si;modprobe ipmi_watchdog' away) and any standard linux watchdog
> program (i.e. http://www.ibiblio.org/pub/Linux/system/daemons/watchdog/)
> and it will auto-reset (or auto-off) your system in the event of a hang.
> When setting this up I set up the BMC watchdog timeout to be a bit over
> 3 times the watchdog checkin interval, because I'm paranoid.  My typical
> config is watchdog will fire after 60 seconds, and the daemon checks in
> every 15 seconds to BMC.  I wrote an even simpler watchdog than what I
> linked to, but it is nearly the simplest possible with even fewer
> features.

All excellent advice. Enough for me to chew on for a while. Thank you.

Miguel


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to