Ok, so I had a network outage yesterday, thankfully on Sunday, so no productivity last. Here is my setup:
All HP Switches my Core switch is an 8212zl and my physical VMware serves and NetApp storage, connected to 2 stacked 3800 switches, they are then trunked with 2 * 10G links and 2 * 1 gig copper as failover to the core switch. Here is what happened, at just after 2PM I started getting e-mail of servers off-line for more than 5 min, and the list just kept growing. I had previously just done some UPS power balancing and had to shut down a few items for the move. I figured maybe I disrupted some power cable as I did the changes. I drove back and physically checked everything, everything looked good I could ping the gateway from some servers but not from others, the whole thing was very strange, finally we figured it out, one of the 10 Gb trunk had failed but the core switch did not realized it was down, that's what caused the strange network behaviour. Ok so now my monitoring guys, says well if it had been configured as lacp there would have been no outage and he says that they configure all switch to switch trunking with lacp. I asked my networking guy that did the initial configuration and his comment is: LACP is industry standard and used widely when you interface servers to switches or different vendors switches / other networking gear. When you have same make (HP or say Cisco), most folks always use Cisco etherchannel / portchannel (which also works with HP) or in HP language trunk. I have never come across anything like this so will not comment that if you have this kind of issue, then LACP would have prevented. If there is a fiber issue, then you can have unidirectional link and then it is UDLD feature with LACP also enabled that helps. But fiber unidirectional is extremely rare, else why 98% of cisco networks will not use LACP. The issue here is that you have someone else managing the network and you use me for help you set up the network, so there will always be a conflict of interest and differences in viewpoints. So is there a correct answer here or I was just extremely unlucky with a hardware failure that did not fail over? __________________________________ Stefan Jafs

