I've seen a wire with both ends plugged into a little 5/8 port switch that 
caused the problem. But it was a long down time, until I found the wire.

Mark

-----Original Message-----
From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On 
Behalf Of Kurt Buff
Sent: Friday, September 20, 2013 10:59 AM
To: NTSysADM@lists.myitforum.com
Subject: [NTSysADM] Semi-OT: Network problem

All,

In the past couple of weeks, $work has had a problem with network interruptions 
- frequent gaps in network connectivity were all contact is lost with servers 
for brief periods of time (1-2 minutes, usually).

I could see the gaps in the graphs on my (very new and incomplete - long story, 
don't ask) cacti installation. Unfortunately, I've been unable to get cacti to 
graph CPU utilization for the switches, because they're Procurves, and I 
couldn't find a working XML file or configuration for that.

It's always happened while I've been unavailable, until today.

Just now, I was able to show conclusively that our core layer3 switch (Procurve 
3400cl-48G), which was hit hardest, spikes its CPU to 99% during these 
episodes. Volume of traffic is normal - ho huge spikes in that, just normal 
variation, AFAICT, from the cacti graphs. I haven't had time to see if other 
switches also spike their CPU, but given the gaps in the graphs, I suspect 
that's the case.

I suspect someone is doing something stupid to create layer2 loop, as we have 
lots of little 5 and 8 port switches on desktops and in our engineering lab - 
and in spite of the fact that I've set our core switch as the root of the 
spanning tree.

I'm setting up a box to do a tcpdump in a ring buffer with smallish files so 
that I can do analysis on them more easily.

I'm not a packet analysis guy, though I've done some looking on occasion.

Anyone have thoughts on what to look for when I start my analysis?

Kurt


Reply via email to