All,

In the past couple of weeks, $work has had a problem with network
interruptions - frequent gaps in network connectivity were all contact
is lost with servers for brief periods of time (1-2 minutes, usually).

I could see the gaps in the graphs on my (very new and incomplete -
long story, don't ask) cacti installation. Unfortunately, I've been
unable to get cacti to graph CPU utilization for the switches, because
they're Procurves, and I couldn't find a working XML file or
configuration for that.

It's always happened while I've been unavailable, until today.

Just now, I was able to show conclusively that our core layer3 switch
(Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99%
during these episodes. Volume of traffic is normal - ho huge spikes in
that, just normal variation, AFAICT, from the cacti graphs. I haven't
had time to see if other switches also spike their CPU, but given the
gaps in the graphs, I suspect that's the case.

I suspect someone is doing something stupid to create layer2 loop, as
we have lots of little 5 and 8 port switches on desktops and in our
engineering lab - and in spite of the fact that I've set our core
switch as the root of the spanning tree.

I'm setting up a box to do a tcpdump in a ring buffer with smallish
files so that I can do analysis on them more easily.

I'm not a packet analysis guy, though I've done some looking on occasion.

Anyone have thoughts on what to look for when I start my analysis?

Kurt


Reply via email to