I've seen a wire with both ends plugged into a little 5/8 port switch that caused the problem. But it was a long down time, until I found the wire.
Mark -----Original Message----- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 10:59 AM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt