Robert, You probably have bad hardware but two sups both being bad sounds a little suspect.
This matches CSCtx83944 a little bit. Would you be willing to supply me the serial numbers of the failed modules? Otherwise, it's probably worth it to open a case with TAC and see what they say. My guess is RMA. Regards, Pete Lumbis TAC Routing Protocols Technical Leader On Thu, Jul 4, 2013 at 7:44 PM, Robert Williams <[email protected]>wrote: > Hi, > > Got a weird persistent issue which I'd like to know if anyone else has > seen. We have a site with a 6503-E chassis, with a 720-3bxl in slot 1 and a > 6516A-gbic in slot 3. It had been running fine (for 310 days) until > recently the facility it's hosted at got very cold (supply air @ 15 > degrees, hitting the base of the chassis through the in-rack floor vent). > Then it started crashing until it warmed up and it was fine again. > > It crashed each time with the same SP error: > > %FABRIC-SP-3-DISABLE_FAB: The fabric manager disabled active fabric in > slot 1 due to the error (2) on this channel (FPOE 4) connected to slot 1 > > The most useful commands I found were: > > #show fabric fpoe map > slot channel fpoe > 1 0 4 > 1 1 0 > 2 0 5 > 2 1 14 > 3 0 2 > 3 1 11 > > #show fabric fpoe interface gi1/1 > fpoe for GigabitEthernet1/1 is 4 > > #show fabric fpoe interface gi1/2 > fpoe for GigabitEthernet1/2 is 4 > > Suggesting that the channel in question is for the supervisors' own > onboard ports. > > Now, once we realised it was temperature related, we decided to get it > back to our lab and test it to see which component it was. So we simply > swapped the whole unit out in one go - the chassis, line card, sup, both > PSUs and PEMs included, even the fan tray. Only the 6 GBICs remained and > were connected back into the new line card. > > In the lab, we made it consistently fail around 17 degrees (it would last > around 3 minutes at that temperature before crashing). If you took it to > about 21 degrees it would run all day just fine. > > Then 2 days later, the new chassis we had installed back at the same site > suddenly crashed. Believe it or not, with, you guessed it, exactly the same > error! > > It's running 15.1(1)SY1, doing full-table BGP to 5 other iBGP peers and > has around 10 vlans and not a lot else. Average traffic around 500mbit/s > total. > > Previously to this failure, the (original) chassis had an uptime of 310 > days (running an older IOS, we upgraded it when it got replaced). > > Is it possible that a GBIC (the only component 'not' swapped) could cause > this? > > Any ideas or suggestions most welcome, as we've literally run out of > components to swap over! > > Cheers, > > > Robert Williams > Custodian Data Centre > Email: [email protected] > http://www.CustodianDC.com > > > > > _______________________________________________ > cisco-nsp mailing list [email protected] > https://puck.nether.net/mailman/listinfo/cisco-nsp > archive at http://puck.nether.net/pipermail/cisco-nsp/ > _______________________________________________ cisco-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
