Re: [NTSysADM] Semi-OT: Network problem

2013-09-25 Thread Kurt Buff
Another not-quite-zombie thread update:

After mucho packet capturing, and trying to figure stuff out myself, I
called in the cavalry.

I sent the packets for a small outbreak to an outside firm that I've
used before, and they handed it to their packethead.

It is/was an STP problem. Coming from the Cisco switches in the lab -
there are several in there that are announcing they are the root
bridge, and prod and dev switches ended up fighting.

I've explained the problem to the director of engineering, and they've
come up with a router and a couple of their own switches, and I'm in
the process of migrating their address space/VLANs off of my equipment
onto their router/switches. I've set up a /30 between the networks,
and will be putting up routes pointing to the new connection as we
migrate stuff off.

BTW - I came across the following while doing some of the research -
it's pretty good:
http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf

Kurt

On Sun, Sep 22, 2013 at 7:05 PM, Micheal Espinola Jr
michealespin...@gmail.com wrote:
 C-D-A, yep yep.

 --
 Espi



 On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote:

 Well, I do remember reading a long time ago that traffic shouldn't go
 through more than three switches on a LAN (was that referred to as the
 diameter? I can't remember) - that pretty much matches the Cisco model
 of core, distribution and access, as described here, among many other
 places:
 http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access

 On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr
 michealespin...@gmail.com wrote:
  Personally speaking, I try to stick to it as well.  I've noticed more
  wonky
  things the more environments diverge from it.  Technically speaking,
  that
  should not make sense - but this an unqualified opinion of mine.
 
  --
  Espi
 
 
 
  On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith
  mich...@smithcons.com
  wrote:
 
  I still use it.
 
 
 
  Violate the rule at your peril. :P
 
 
 
  From: listsad...@lists.myitforum.com
  [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link
 
 
  Sent: Friday, September 20, 2013 2:07 PM
 
 
  To: ntsysadm@lists.myitforum.com
  Subject: Re: [NTSysADM] Semi-OT: Network problem
 
 
 
  Is this the equivalent of Vader saying Your powers are weak, old man
  to
  Obi Wan?
 
 
 
  On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote:
 
  Sigh. Yes, but...
 
  The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
  types of Ethernet network available. The rule only applies to
  shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to
  switched Ethernet because each port on a switch constitutes a separate
  collision domain.
 
  :)
 
  Kurt
 
  On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
  mich...@smithcons.com wrote:
   http://en.wikipedia.org/wiki/5-4-3_rule
  
  
 
   -Original Message-
   From: listsad...@lists.myitforum.com
   [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff
 
   Sent: Friday, September 20, 2013 12:59 PM
   To: NTSysADM@lists.myitforum.com
   Subject: [NTSysADM] Semi-OT: Network problem
  
   All,
  
   In the past couple of weeks, $work has had a problem with network
   interruptions - frequent gaps in network connectivity were all
   contact is
   lost with servers for brief periods of time (1-2 minutes, usually).
  
   I could see the gaps in the graphs on my (very new and incomplete -
   long
   story, don't ask) cacti installation. Unfortunately, I've been unable
   to get
   cacti to graph CPU utilization for the switches, because they're
   Procurves,
   and I couldn't find a working XML file or configuration for that.
  
   It's always happened while I've been unavailable, until today.
  
   Just now, I was able to show conclusively that our core layer3 switch
   (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99%
   during
   these episodes. Volume of traffic is normal - ho huge spikes in that,
   just
   normal variation, AFAICT, from the cacti graphs. I haven't had time
   to see
   if other switches also spike their CPU, but given the gaps in the
   graphs, I
   suspect that's the case.
  
   I suspect someone is doing something stupid to create layer2 loop, as
   we
   have lots of little 5 and 8 port switches on desktops and in our
   engineering
   lab - and in spite of the fact that I've set our core switch as the
   root of
   the spanning tree.
  
   I'm setting up a box to do a tcpdump in a ring buffer with smallish
   files so that I can do analysis on them more easily.
  
   I'm not a packet analysis guy, though I've done some looking on
   occasion.
  
   Anyone have thoughts on what to look for when I start my analysis?
  
   Kurt
  
  
 
 
 
 







Re: [NTSysADM] Semi-OT: Network problem

2013-09-25 Thread Micheal Espinola Jr
STP will never stop being something to bite us all in the ass...

--
Espi



On Wed, Sep 25, 2013 at 12:56 PM, Kurt Buff kurt.b...@gmail.com wrote:

 Another not-quite-zombie thread update:

 After mucho packet capturing, and trying to figure stuff out myself, I
 called in the cavalry.

 I sent the packets for a small outbreak to an outside firm that I've
 used before, and they handed it to their packethead.

 It is/was an STP problem. Coming from the Cisco switches in the lab -
 there are several in there that are announcing they are the root
 bridge, and prod and dev switches ended up fighting.

 I've explained the problem to the director of engineering, and they've
 come up with a router and a couple of their own switches, and I'm in
 the process of migrating their address space/VLANs off of my equipment
 onto their router/switches. I've set up a /30 between the networks,
 and will be putting up routes pointing to the new connection as we
 migrate stuff off.

 BTW - I came across the following while doing some of the research -
 it's pretty good:
 http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf

 Kurt

 On Sun, Sep 22, 2013 at 7:05 PM, Micheal Espinola Jr
 michealespin...@gmail.com wrote:
  C-D-A, yep yep.
 
  --
  Espi
 
 
 
  On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote:
 
  Well, I do remember reading a long time ago that traffic shouldn't go
  through more than three switches on a LAN (was that referred to as the
  diameter? I can't remember) - that pretty much matches the Cisco model
  of core, distribution and access, as described here, among many other
  places:
  http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access
 
  On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr
  michealespin...@gmail.com wrote:
   Personally speaking, I try to stick to it as well.  I've noticed more
   wonky
   things the more environments diverge from it.  Technically speaking,
   that
   should not make sense - but this an unqualified opinion of mine.
  
   --
   Espi
  
  
  
   On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith
   mich...@smithcons.com
   wrote:
  
   I still use it.
  
  
  
   Violate the rule at your peril. :P
  
  
  
   From: listsad...@lists.myitforum.com
   [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link
  
  
   Sent: Friday, September 20, 2013 2:07 PM
  
  
   To: ntsysadm@lists.myitforum.com
   Subject: Re: [NTSysADM] Semi-OT: Network problem
  
  
  
   Is this the equivalent of Vader saying Your powers are weak, old
 man
   to
   Obi Wan?
  
  
  
   On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com
 wrote:
  
   Sigh. Yes, but...
  
   The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
   types of Ethernet network available. The rule only applies to
   shared-access 10 Mbit/s Ethernet backbones. The rule does not apply
 to
   switched Ethernet because each port on a switch constitutes a
 separate
   collision domain.
  
   :)
  
   Kurt
  
   On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
   mich...@smithcons.com wrote:
http://en.wikipedia.org/wiki/5-4-3_rule
   
   
  
-Original Message-
From: listsad...@lists.myitforum.com
[mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff
  
Sent: Friday, September 20, 2013 12:59 PM
To: NTSysADM@lists.myitforum.com
Subject: [NTSysADM] Semi-OT: Network problem
   
All,
   
In the past couple of weeks, $work has had a problem with network
interruptions - frequent gaps in network connectivity were all
contact is
lost with servers for brief periods of time (1-2 minutes, usually).
   
I could see the gaps in the graphs on my (very new and incomplete -
long
story, don't ask) cacti installation. Unfortunately, I've been
 unable
to get
cacti to graph CPU utilization for the switches, because they're
Procurves,
and I couldn't find a working XML file or configuration for that.
   
It's always happened while I've been unavailable, until today.
   
Just now, I was able to show conclusively that our core layer3
 switch
(Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99%
during
these episodes. Volume of traffic is normal - ho huge spikes in
 that,
just
normal variation, AFAICT, from the cacti graphs. I haven't had time
to see
if other switches also spike their CPU, but given the gaps in the
graphs, I
suspect that's the case.
   
I suspect someone is doing something stupid to create layer2 loop,
 as
we
have lots of little 5 and 8 port switches on desktops and in our
engineering
lab - and in spite of the fact that I've set our core switch as the
root of
the spanning tree.
   
I'm setting up a box to do a tcpdump in a ring buffer with smallish
files so that I can do analysis on them more easily.
   
I'm not a packet analysis guy, though I've done some looking

Re: [NTSysADM] Semi-OT: Network problem

2013-09-25 Thread Jonathan Link
A better allusion might be a branch swatting us in the face.  I've never
known a tree to bite.


On Wed, Sep 25, 2013 at 4:33 PM, Micheal Espinola Jr 
michealespin...@gmail.com wrote:

 STP will never stop being something to bite us all in the ass...

 --
 Espi



 On Wed, Sep 25, 2013 at 12:56 PM, Kurt Buff kurt.b...@gmail.com wrote:

 Another not-quite-zombie thread update:

 After mucho packet capturing, and trying to figure stuff out myself, I
 called in the cavalry.

 I sent the packets for a small outbreak to an outside firm that I've
 used before, and they handed it to their packethead.

 It is/was an STP problem. Coming from the Cisco switches in the lab -
 there are several in there that are announcing they are the root
 bridge, and prod and dev switches ended up fighting.

 I've explained the problem to the director of engineering, and they've
 come up with a router and a couple of their own switches, and I'm in
 the process of migrating their address space/VLANs off of my equipment
 onto their router/switches. I've set up a /30 between the networks,
 and will be putting up routes pointing to the new connection as we
 migrate stuff off.

 BTW - I came across the following while doing some of the research -
 it's pretty good:
 http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf

 Kurt

 On Sun, Sep 22, 2013 at 7:05 PM, Micheal Espinola Jr
 michealespin...@gmail.com wrote:
  C-D-A, yep yep.
 
  --
  Espi
 
 
 
  On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote:
 
  Well, I do remember reading a long time ago that traffic shouldn't go
  through more than three switches on a LAN (was that referred to as the
  diameter? I can't remember) - that pretty much matches the Cisco model
  of core, distribution and access, as described here, among many other
  places:
 
 http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access
 
  On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr
  michealespin...@gmail.com wrote:
   Personally speaking, I try to stick to it as well.  I've noticed more
   wonky
   things the more environments diverge from it.  Technically speaking,
   that
   should not make sense - but this an unqualified opinion of mine.
  
   --
   Espi
  
  
  
   On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith
   mich...@smithcons.com
   wrote:
  
   I still use it.
  
  
  
   Violate the rule at your peril. :P
  
  
  
   From: listsad...@lists.myitforum.com
   [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link
  
  
   Sent: Friday, September 20, 2013 2:07 PM
  
  
   To: ntsysadm@lists.myitforum.com
   Subject: Re: [NTSysADM] Semi-OT: Network problem
  
  
  
   Is this the equivalent of Vader saying Your powers are weak, old
 man
   to
   Obi Wan?
  
  
  
   On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com
 wrote:
  
   Sigh. Yes, but...
  
   The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
   types of Ethernet network available. The rule only applies to
   shared-access 10 Mbit/s Ethernet backbones. The rule does not apply
 to
   switched Ethernet because each port on a switch constitutes a
 separate
   collision domain.
  
   :)
  
   Kurt
  
   On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
   mich...@smithcons.com wrote:
http://en.wikipedia.org/wiki/5-4-3_rule
   
   
  
-Original Message-
From: listsad...@lists.myitforum.com
[mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff
  
Sent: Friday, September 20, 2013 12:59 PM
To: NTSysADM@lists.myitforum.com
Subject: [NTSysADM] Semi-OT: Network problem
   
All,
   
In the past couple of weeks, $work has had a problem with network
interruptions - frequent gaps in network connectivity were all
contact is
lost with servers for brief periods of time (1-2 minutes,
 usually).
   
I could see the gaps in the graphs on my (very new and incomplete
 -
long
story, don't ask) cacti installation. Unfortunately, I've been
 unable
to get
cacti to graph CPU utilization for the switches, because they're
Procurves,
and I couldn't find a working XML file or configuration for that.
   
It's always happened while I've been unavailable, until today.
   
Just now, I was able to show conclusively that our core layer3
 switch
(Procurve 3400cl-48G), which was hit hardest, spikes its CPU to
 99%
during
these episodes. Volume of traffic is normal - ho huge spikes in
 that,
just
normal variation, AFAICT, from the cacti graphs. I haven't had
 time
to see
if other switches also spike their CPU, but given the gaps in the
graphs, I
suspect that's the case.
   
I suspect someone is doing something stupid to create layer2
 loop, as
we
have lots of little 5 and 8 port switches on desktops and in our
engineering
lab - and in spite of the fact that I've set our core switch as
 the
root of
the spanning tree.
   
I'm

RE: [NTSysADM] Semi-OT: Network problem

2013-09-25 Thread Woody Blackman
Well, its bark is worse than its bite...

From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On 
Behalf Of Micheal Espinola Jr
Sent: Wednesday, September 25, 2013 2:45 PM
To: ntsysadm@lists.myitforum.com
Subject: Re: [NTSysADM] Semi-OT: Network problem

I like it! :-)

--
Espi


On Wed, Sep 25, 2013 at 1:37 PM, Jonathan Link 
jonathan.l...@gmail.commailto:jonathan.l...@gmail.com wrote:
A better allusion might be a branch swatting us in the face.  I've never known 
a tree to bite.

On Wed, Sep 25, 2013 at 4:33 PM, Micheal Espinola Jr 
michealespin...@gmail.commailto:michealespin...@gmail.com wrote:
STP will never stop being something to bite us all in the ass...

Snip




RE: [NTSysADM] Semi-OT: Network problem

2013-09-23 Thread Michael B. Smith
And bit rot is real.

-Original Message-
From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On 
Behalf Of Ben Scott
Sent: Monday, September 23, 2013 5:48 PM
To: ntsysadm@lists.myitforum.com
Subject: Re: [NTSysADM] Semi-OT: Network problem

On Sun, Sep 22, 2013 at 9:56 PM, Kurt Buff kurt.b...@gmail.com wrote:
 Well, I do remember reading a long time ago that traffic shouldn't go 
 through more than three switches on a LAN (was that referred to as the 
 diameter? I can't remember) - that pretty much matches the Cisco model 
 of core, distribution and access, as described here, among many other
 places:

  Well, there's a difference between not a best practice and violates the 
specification.

  The 5-4-3 rule arises because of how classic Ethernet works.  It's a shared 
medium, meaning all nodes on the network are connected to a common signalling 
bus.  It takes time for a signal to travel through the bus.  If the bus is too 
long (or has too many repeaters), a signal transmitted from one end may not 
reach the other end before various time windows close, leading to things like 
missed collisions, missed preambles, and the like.  Worse, they won't be 
retried by the link layer when they should be.  (Ethernet does not guarantee 
delivery, but it does have *some* retry logic built-in.)  Silently corrupted 
payload is even possible.

  (The 5-4-3 rule is technically only a guideline because Ethernet doesn't 
care about repeater hops, it cares about signal timings.  The only way to truly 
confirm adherence to all aspects of the specification was with a network 
analyzer.  Those are expensive, so a rule of thumb was needed.  5-4-3 worked 
for most everything.  If your equipment happened to need less margin within the 
spec, your network could be bigger.)

  Modern Ethernet puts a transceiver at each end of each cable.  Each cable is 
a point-to-point link, as far as the MAC sublayer is concerned.  The switches 
receive and buffer frames, as opposed to simply amplifying the electrical 
signal, the way a repeater does.
Assuming full duplex, you don't have collisions at all, so you don't have to 
worry about jam signal propagation.  You don't have to worry about preamble 
degradation, since each switch is transmitting a new preamble.

  However, if you have to go through 42 switches to get to your destination, 
that's still bad.  Ethernet still does not guarantee delivery, and each link is 
another chance for a frame to be corrupted and discarded.  Each switch also 
introduces more latency, and a lot of LAN protocols (SMB, I'm looking at you) 
*hate* latency.

-- Ben






Re: [NTSysADM] Semi-OT: Network problem

2013-09-22 Thread Kurt Buff
Well, I do remember reading a long time ago that traffic shouldn't go
through more than three switches on a LAN (was that referred to as the
diameter? I can't remember) - that pretty much matches the Cisco model
of core, distribution and access, as described here, among many other
places:
http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access

On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr
michealespin...@gmail.com wrote:
 Personally speaking, I try to stick to it as well.  I've noticed more wonky
 things the more environments diverge from it.  Technically speaking, that
 should not make sense - but this an unqualified opinion of mine.

 --
 Espi



 On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith mich...@smithcons.com
 wrote:

 I still use it.



 Violate the rule at your peril. :P



 From: listsad...@lists.myitforum.com
 [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link


 Sent: Friday, September 20, 2013 2:07 PM


 To: ntsysadm@lists.myitforum.com
 Subject: Re: [NTSysADM] Semi-OT: Network problem



 Is this the equivalent of Vader saying Your powers are weak, old man to
 Obi Wan?



 On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote:

 Sigh. Yes, but...

 The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
 types of Ethernet network available. The rule only applies to
 shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to
 switched Ethernet because each port on a switch constitutes a separate
 collision domain.

 :)

 Kurt

 On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
 mich...@smithcons.com wrote:
  http://en.wikipedia.org/wiki/5-4-3_rule
 
 

  -Original Message-
  From: listsad...@lists.myitforum.com
  [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff

  Sent: Friday, September 20, 2013 12:59 PM
  To: NTSysADM@lists.myitforum.com
  Subject: [NTSysADM] Semi-OT: Network problem
 
  All,
 
  In the past couple of weeks, $work has had a problem with network
  interruptions - frequent gaps in network connectivity were all contact is
  lost with servers for brief periods of time (1-2 minutes, usually).
 
  I could see the gaps in the graphs on my (very new and incomplete - long
  story, don't ask) cacti installation. Unfortunately, I've been unable to 
  get
  cacti to graph CPU utilization for the switches, because they're Procurves,
  and I couldn't find a working XML file or configuration for that.
 
  It's always happened while I've been unavailable, until today.
 
  Just now, I was able to show conclusively that our core layer3 switch
  (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during
  these episodes. Volume of traffic is normal - ho huge spikes in that, just
  normal variation, AFAICT, from the cacti graphs. I haven't had time to see
  if other switches also spike their CPU, but given the gaps in the graphs, I
  suspect that's the case.
 
  I suspect someone is doing something stupid to create layer2 loop, as we
  have lots of little 5 and 8 port switches on desktops and in our 
  engineering
  lab - and in spite of the fact that I've set our core switch as the root of
  the spanning tree.
 
  I'm setting up a box to do a tcpdump in a ring buffer with smallish
  files so that I can do analysis on them more easily.
 
  I'm not a packet analysis guy, though I've done some looking on
  occasion.
 
  Anyone have thoughts on what to look for when I start my analysis?
 
  Kurt
 
 








Re: [NTSysADM] Semi-OT: Network problem

2013-09-22 Thread Micheal Espinola Jr
C-D-A, yep yep.

--
Espi



On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote:

 Well, I do remember reading a long time ago that traffic shouldn't go
 through more than three switches on a LAN (was that referred to as the
 diameter? I can't remember) - that pretty much matches the Cisco model
 of core, distribution and access, as described here, among many other
 places:
 http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access

 On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr
 michealespin...@gmail.com wrote:
  Personally speaking, I try to stick to it as well.  I've noticed more
 wonky
  things the more environments diverge from it.  Technically speaking, that
  should not make sense - but this an unqualified opinion of mine.
 
  --
  Espi
 
 
 
  On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith 
 mich...@smithcons.com
  wrote:
 
  I still use it.
 
 
 
  Violate the rule at your peril. :P
 
 
 
  From: listsad...@lists.myitforum.com
  [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link
 
 
  Sent: Friday, September 20, 2013 2:07 PM
 
 
  To: ntsysadm@lists.myitforum.com
  Subject: Re: [NTSysADM] Semi-OT: Network problem
 
 
 
  Is this the equivalent of Vader saying Your powers are weak, old man
 to
  Obi Wan?
 
 
 
  On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote:
 
  Sigh. Yes, but...
 
  The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
  types of Ethernet network available. The rule only applies to
  shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to
  switched Ethernet because each port on a switch constitutes a separate
  collision domain.
 
  :)
 
  Kurt
 
  On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
  mich...@smithcons.com wrote:
   http://en.wikipedia.org/wiki/5-4-3_rule
  
  
 
   -Original Message-
   From: listsad...@lists.myitforum.com
   [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff
 
   Sent: Friday, September 20, 2013 12:59 PM
   To: NTSysADM@lists.myitforum.com
   Subject: [NTSysADM] Semi-OT: Network problem
  
   All,
  
   In the past couple of weeks, $work has had a problem with network
   interruptions - frequent gaps in network connectivity were all
 contact is
   lost with servers for brief periods of time (1-2 minutes, usually).
  
   I could see the gaps in the graphs on my (very new and incomplete -
 long
   story, don't ask) cacti installation. Unfortunately, I've been unable
 to get
   cacti to graph CPU utilization for the switches, because they're
 Procurves,
   and I couldn't find a working XML file or configuration for that.
  
   It's always happened while I've been unavailable, until today.
  
   Just now, I was able to show conclusively that our core layer3 switch
   (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99%
 during
   these episodes. Volume of traffic is normal - ho huge spikes in that,
 just
   normal variation, AFAICT, from the cacti graphs. I haven't had time
 to see
   if other switches also spike their CPU, but given the gaps in the
 graphs, I
   suspect that's the case.
  
   I suspect someone is doing something stupid to create layer2 loop, as
 we
   have lots of little 5 and 8 port switches on desktops and in our
 engineering
   lab - and in spite of the fact that I've set our core switch as the
 root of
   the spanning tree.
  
   I'm setting up a box to do a tcpdump in a ring buffer with smallish
   files so that I can do analysis on them more easily.
  
   I'm not a packet analysis guy, though I've done some looking on
   occasion.
  
   Anyone have thoughts on what to look for when I start my analysis?
  
   Kurt
  
  
 
 
 
 






RE: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Reimer, Mark
I've seen a wire with both ends plugged into a little 5/8 port switch that 
caused the problem. But it was a long down time, until I found the wire.

Mark

-Original Message-
From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On 
Behalf Of Kurt Buff
Sent: Friday, September 20, 2013 10:59 AM
To: NTSysADM@lists.myitforum.com
Subject: [NTSysADM] Semi-OT: Network problem

All,

In the past couple of weeks, $work has had a problem with network interruptions 
- frequent gaps in network connectivity were all contact is lost with servers 
for brief periods of time (1-2 minutes, usually).

I could see the gaps in the graphs on my (very new and incomplete - long story, 
don't ask) cacti installation. Unfortunately, I've been unable to get cacti to 
graph CPU utilization for the switches, because they're Procurves, and I 
couldn't find a working XML file or configuration for that.

It's always happened while I've been unavailable, until today.

Just now, I was able to show conclusively that our core layer3 switch (Procurve 
3400cl-48G), which was hit hardest, spikes its CPU to 99% during these 
episodes. Volume of traffic is normal - ho huge spikes in that, just normal 
variation, AFAICT, from the cacti graphs. I haven't had time to see if other 
switches also spike their CPU, but given the gaps in the graphs, I suspect 
that's the case.

I suspect someone is doing something stupid to create layer2 loop, as we have 
lots of little 5 and 8 port switches on desktops and in our engineering lab - 
and in spite of the fact that I've set our core switch as the root of the 
spanning tree.

I'm setting up a box to do a tcpdump in a ring buffer with smallish files so 
that I can do analysis on them more easily.

I'm not a packet analysis guy, though I've done some looking on occasion.

Anyone have thoughts on what to look for when I start my analysis?

Kurt




Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Ben Scott
On Fri, Sep 20, 2013 at 1:37 PM, Michael B. Smith mich...@smithcons.com wrote:
 http://en.wikipedia.org/wiki/5-4-3_rule

  That's for repeaters.  If that applies, I'd suggest an alternate
approach... ;-)

-- Ben




Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Kurt Buff
On Fri, Sep 20, 2013 at 11:03 AM, Ben Scott mailvor...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 12:58 PM, Kurt Buff kurt.b...@gmail.com wrote:
 ... core layer3 switch ... spikes its CPU to 99% during these episodes ...
 ... Volume of traffic is normal ...

   CPU spikes on a switch is usually something weird.  Normal traffic
 is handled in the switch ASIC and doesn't touch the CPU at all.
 Typically it's things like ACLs or policy routing that hit the CPU.
 Got anything like that going on?

 ... layer2 loop ...

   A layer two loop will light up every switch port on the first
 broadcast packet (or trigger loop detection, which should get logged),
 so I don't think that's it.


No, the configuration of the L3 switch is stupidly simple - I've got
all of my servers plugged into it, and all of my distribution
switches. It's got 34 of VLANs defined (max-vlans is set to 100), and
it's x.x.x.1 on every subnet except the L2 VLAN that terminates on the
firewall. I've got 4 x 4-port trunks on it (3 for my VMware boxes and
one for the backup machine - the backup machine's trunk is LACP, the
others are not, since VMware doesn't support LACP).

No particular changes to the config in months (when I set up the LACP
trunk for the backup machine.

No ACLs, and two routes - a DG and a static to another switch for a lab subnet.

Kurt




Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Ben Scott
On Fri, Sep 20, 2013 at 12:58 PM, Kurt Buff kurt.b...@gmail.com wrote:
 ... core layer3 switch ... spikes its CPU to 99% during these episodes ...
 ... Volume of traffic is normal ...

  CPU spikes on a switch is usually something weird.  Normal traffic
is handled in the switch ASIC and doesn't touch the CPU at all.
Typically it's things like ACLs or policy routing that hit the CPU.
Got anything like that going on?

 ... layer2 loop ...

  A layer two loop will light up every switch port on the first
broadcast packet (or trigger loop detection, which should get logged),
so I don't think that's it.

-- Ben




RE: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Richard McClary
We had a bad weekend a couple of month ago when every 24 minutes our LAN would 
pretty much vanish for about 30-60 seconds.  It turns out what truly appeared 
to be a workgroup switch was actually a hub.  One Friday afternoon it decided 
to show us all why hubs do not belong in networks.

--
richard

-Original Message-
From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On 
Behalf Of Kurt Buff
Sent: Friday, September 20, 2013 1:12 PM
To: ntsysadm@lists.myitforum.com
Subject: Re: [NTSysADM] Semi-OT: Network problem

On Fri, Sep 20, 2013 at 11:03 AM, Ben Scott mailvor...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 12:58 PM, Kurt Buff kurt.b...@gmail.com wrote:
 ... core layer3 switch ... spikes its CPU to 99% during these episodes ...
 ... Volume of traffic is normal ...

   CPU spikes on a switch is usually something weird.  Normal traffic 
 is handled in the switch ASIC and doesn't touch the CPU at all.
 Typically it's things like ACLs or policy routing that hit the CPU.
 Got anything like that going on?

 ... layer2 loop ...

   A layer two loop will light up every switch port on the first 
 broadcast packet (or trigger loop detection, which should get logged), 
 so I don't think that's it.


No, the configuration of the L3 switch is stupidly simple - I've got all of my 
servers plugged into it, and all of my distribution switches. It's got 34 of 
VLANs defined (max-vlans is set to 100), and it's x.x.x.1 on every subnet 
except the L2 VLAN that terminates on the firewall. I've got 4 x 4-port trunks 
on it (3 for my VMware boxes and one for the backup machine - the backup 
machine's trunk is LACP, the others are not, since VMware doesn't support LACP).

No particular changes to the config in months (when I set up the LACP trunk for 
the backup machine.

No ACLs, and two routes - a DG and a static to another switch for a lab subnet.

Kurt




The information contained in this e-mail, and any attachments hereto, is from 
The American Society for the Prevention of Cruelty to Animals® (ASPCA®) and is 
intended only for use by the addressee(s) named herein and may contain legally 
privileged and/or confidential information. If you are not the intended 
recipient of this e-mail, you are hereby notified that any dissemination, 
distribution, copying or use of the contents of this e-mail, and any 
attachments hereto, is strictly prohibited. If you have received this e-mail in 
error, please immediately notify me by reply email and permanently delete the 
original and any copy of this e-mail and any printout thereof.

Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Kurt Buff
Yes, that's why on my other switches (Procurve 2510-48), I have set up
loop-detect parameters, in addition to spanning tree. I have it lock
out the port for 10 minutes.

Kurt

On Fri, Sep 20, 2013 at 10:53 AM, Reimer, Mark mark.rei...@prairie.edu wrote:
 I've seen a wire with both ends plugged into a little 5/8 port switch that 
 caused the problem. But it was a long down time, until I found the wire.

 Mark

 -Original Message-
 From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] 
 On Behalf Of Kurt Buff
 Sent: Friday, September 20, 2013 10:59 AM
 To: NTSysADM@lists.myitforum.com
 Subject: [NTSysADM] Semi-OT: Network problem

 All,

 In the past couple of weeks, $work has had a problem with network 
 interruptions - frequent gaps in network connectivity were all contact is 
 lost with servers for brief periods of time (1-2 minutes, usually).

 I could see the gaps in the graphs on my (very new and incomplete - long 
 story, don't ask) cacti installation. Unfortunately, I've been unable to get 
 cacti to graph CPU utilization for the switches, because they're Procurves, 
 and I couldn't find a working XML file or configuration for that.

 It's always happened while I've been unavailable, until today.

 Just now, I was able to show conclusively that our core layer3 switch 
 (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during 
 these episodes. Volume of traffic is normal - ho huge spikes in that, just 
 normal variation, AFAICT, from the cacti graphs. I haven't had time to see if 
 other switches also spike their CPU, but given the gaps in the graphs, I 
 suspect that's the case.

 I suspect someone is doing something stupid to create layer2 loop, as we have 
 lots of little 5 and 8 port switches on desktops and in our engineering lab - 
 and in spite of the fact that I've set our core switch as the root of the 
 spanning tree.

 I'm setting up a box to do a tcpdump in a ring buffer with smallish files so 
 that I can do analysis on them more easily.

 I'm not a packet analysis guy, though I've done some looking on occasion.

 Anyone have thoughts on what to look for when I start my analysis?

 Kurt






RE: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Michael B. Smith
I still use it.

Violate the rule at your peril. :P

From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On 
Behalf Of Jonathan Link
Sent: Friday, September 20, 2013 2:07 PM
To: ntsysadm@lists.myitforum.com
Subject: Re: [NTSysADM] Semi-OT: Network problem

Is this the equivalent of Vader saying Your powers are weak, old man to Obi 
Wan?

On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff 
kurt.b...@gmail.commailto:kurt.b...@gmail.com wrote:
Sigh. Yes, but...

The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
types of Ethernet network available. The rule only applies to
shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to
switched Ethernet because each port on a switch constitutes a separate
collision domain.

:)

Kurt

On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
mich...@smithcons.commailto:mich...@smithcons.com wrote:
 http://en.wikipedia.org/wiki/5-4-3_rule


 -Original Message-
 From: listsad...@lists.myitforum.commailto:listsad...@lists.myitforum.com 
 [mailto:listsad...@lists.myitforum.commailto:listsad...@lists.myitforum.com]
  On Behalf Of Kurt Buff
 Sent: Friday, September 20, 2013 12:59 PM
 To: NTSysADM@lists.myitforum.commailto:NTSysADM@lists.myitforum.com
 Subject: [NTSysADM] Semi-OT: Network problem

 All,

 In the past couple of weeks, $work has had a problem with network 
 interruptions - frequent gaps in network connectivity were all contact is 
 lost with servers for brief periods of time (1-2 minutes, usually).

 I could see the gaps in the graphs on my (very new and incomplete - long 
 story, don't ask) cacti installation. Unfortunately, I've been unable to get 
 cacti to graph CPU utilization for the switches, because they're Procurves, 
 and I couldn't find a working XML file or configuration for that.

 It's always happened while I've been unavailable, until today.

 Just now, I was able to show conclusively that our core layer3 switch 
 (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during 
 these episodes. Volume of traffic is normal - ho huge spikes in that, just 
 normal variation, AFAICT, from the cacti graphs. I haven't had time to see if 
 other switches also spike their CPU, but given the gaps in the graphs, I 
 suspect that's the case.

 I suspect someone is doing something stupid to create layer2 loop, as we have 
 lots of little 5 and 8 port switches on desktops and in our engineering lab - 
 and in spite of the fact that I've set our core switch as the root of the 
 spanning tree.

 I'm setting up a box to do a tcpdump in a ring buffer with smallish files so 
 that I can do analysis on them more easily.

 I'm not a packet analysis guy, though I've done some looking on occasion.

 Anyone have thoughts on what to look for when I start my analysis?

 Kurt







Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Ben Scott
On Fri, Sep 20, 2013 at 2:12 PM, Kurt Buff kurt.b...@gmail.com wrote:
 No, the configuration of the L3 switch is stupidly simple ...

  Very odd that you're getting CPU spikes, then.

  You've done a show log -a on the switch right after the trouble
and found nothing helpful, I presume?

  Have you checked for firmware updates?

 ... the backup machine's trunk is LACP ...

  Is the backup machine behaving itself?  LACP reconfiguration prolly
hits the CPU.  STP will hit the CPU.  But I'm shooting in the dark,
here.

  I'd call HP support.  They know what magic commands to issue to get
the switch to cough up relevant debug info.

 No ACLs, and two routes - a DG and a static to another switch for a lab 
 subnet.

  I believe routing is done on ASICs with that model anyway.

-- Ben




Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Kurt Buff
No, I figured he was having me on...

Kurt

On Fri, Sep 20, 2013 at 11:07 AM, Jonathan Link jonathan.l...@gmail.com wrote:
 Is this the equivalent of Vader saying Your powers are weak, old man to
 Obi Wan?


 On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote:

 Sigh. Yes, but...

 The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only
 types of Ethernet network available. The rule only applies to
 shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to
 switched Ethernet because each port on a switch constitutes a separate
 collision domain.

 :)

 Kurt

 On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith
 mich...@smithcons.com wrote:
  http://en.wikipedia.org/wiki/5-4-3_rule
 
 
  -Original Message-
  From: listsad...@lists.myitforum.com
  [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff
  Sent: Friday, September 20, 2013 12:59 PM
  To: NTSysADM@lists.myitforum.com
  Subject: [NTSysADM] Semi-OT: Network problem
 
  All,
 
  In the past couple of weeks, $work has had a problem with network
  interruptions - frequent gaps in network connectivity were all contact is
  lost with servers for brief periods of time (1-2 minutes, usually).
 
  I could see the gaps in the graphs on my (very new and incomplete - long
  story, don't ask) cacti installation. Unfortunately, I've been unable to 
  get
  cacti to graph CPU utilization for the switches, because they're Procurves,
  and I couldn't find a working XML file or configuration for that.
 
  It's always happened while I've been unavailable, until today.
 
  Just now, I was able to show conclusively that our core layer3 switch
  (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during
  these episodes. Volume of traffic is normal - ho huge spikes in that, just
  normal variation, AFAICT, from the cacti graphs. I haven't had time to see
  if other switches also spike their CPU, but given the gaps in the graphs, I
  suspect that's the case.
 
  I suspect someone is doing something stupid to create layer2 loop, as we
  have lots of little 5 and 8 port switches on desktops and in our 
  engineering
  lab - and in spite of the fact that I've set our core switch as the root of
  the spanning tree.
 
  I'm setting up a box to do a tcpdump in a ring buffer with smallish
  files so that I can do analysis on them more easily.
 
  I'm not a packet analysis guy, though I've done some looking on
  occasion.
 
  Anyone have thoughts on what to look for when I start my analysis?
 
  Kurt
 
 







Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Ben Scott
On Fri, Sep 20, 2013 at 2:59 PM, Michael B. Smith mich...@smithcons.com wrote:
 I still use it.
 Violate the rule at your peril. :P

  Technically speaking, if you're using switches everywhere, you're
still following the rule, because every link is its own collision
domain.  ;-)

-- Ben




Re: [NTSysADM] Semi-OT: Network problem

2013-09-20 Thread Ben Scott
On Fri, Sep 20, 2013 at 3:45 PM, Kurt Buff kurt.b...@gmail.com wrote:
   You've done a show log -a on the switch right after the trouble
 and found nothing helpful, I presume?

 On the 3400cl, 'show log' says the same as 'sho log -a' - nothing of
 interest.

  The -a just tells it to include events from before the last
reboot.  I threw that in in case you had rebooted trying to clear the
trouble.

 Just that the monitor port has a high collision or drop rate
 once in a while, and that doesn't correlate with the network
 interruptions..

  I wouldn't *think* port mirroring would need the CPU for anything,
but I don't actually know.

 I'm going to take a look at the packets I've captured first, and see
 what I can, but HP support might well be the answer.

  Reason I suggest calling support is they're likely to be able to
tell you how to tell exactly what is causing the CPU to spike.  They
might not solve the root cause problem for you, but that's info you
need and don't have.

-- Ben