Re: [NTSysADM] Semi-OT: Network problem
Another not-quite-zombie thread update: After mucho packet capturing, and trying to figure stuff out myself, I called in the cavalry. I sent the packets for a small outbreak to an outside firm that I've used before, and they handed it to their packethead. It is/was an STP problem. Coming from the Cisco switches in the lab - there are several in there that are announcing they are the root bridge, and prod and dev switches ended up fighting. I've explained the problem to the director of engineering, and they've come up with a router and a couple of their own switches, and I'm in the process of migrating their address space/VLANs off of my equipment onto their router/switches. I've set up a /30 between the networks, and will be putting up routes pointing to the new connection as we migrate stuff off. BTW - I came across the following while doing some of the research - it's pretty good: http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf Kurt On Sun, Sep 22, 2013 at 7:05 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: C-D-A, yep yep. -- Espi On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Well, I do remember reading a long time ago that traffic shouldn't go through more than three switches on a LAN (was that referred to as the diameter? I can't remember) - that pretty much matches the Cisco model of core, distribution and access, as described here, among many other places: http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: Personally speaking, I try to stick to it as well. I've noticed more wonky things the more environments diverge from it. Technically speaking, that should not make sense - but this an unqualified opinion of mine. -- Espi On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith mich...@smithcons.com wrote: I still use it. Violate the rule at your peril. :P From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link Sent: Friday, September 20, 2013 2:07 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
Re: [NTSysADM] Semi-OT: Network problem
STP will never stop being something to bite us all in the ass... -- Espi On Wed, Sep 25, 2013 at 12:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Another not-quite-zombie thread update: After mucho packet capturing, and trying to figure stuff out myself, I called in the cavalry. I sent the packets for a small outbreak to an outside firm that I've used before, and they handed it to their packethead. It is/was an STP problem. Coming from the Cisco switches in the lab - there are several in there that are announcing they are the root bridge, and prod and dev switches ended up fighting. I've explained the problem to the director of engineering, and they've come up with a router and a couple of their own switches, and I'm in the process of migrating their address space/VLANs off of my equipment onto their router/switches. I've set up a /30 between the networks, and will be putting up routes pointing to the new connection as we migrate stuff off. BTW - I came across the following while doing some of the research - it's pretty good: http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf Kurt On Sun, Sep 22, 2013 at 7:05 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: C-D-A, yep yep. -- Espi On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Well, I do remember reading a long time ago that traffic shouldn't go through more than three switches on a LAN (was that referred to as the diameter? I can't remember) - that pretty much matches the Cisco model of core, distribution and access, as described here, among many other places: http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: Personally speaking, I try to stick to it as well. I've noticed more wonky things the more environments diverge from it. Technically speaking, that should not make sense - but this an unqualified opinion of mine. -- Espi On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith mich...@smithcons.com wrote: I still use it. Violate the rule at your peril. :P From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link Sent: Friday, September 20, 2013 2:07 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking
Re: [NTSysADM] Semi-OT: Network problem
A better allusion might be a branch swatting us in the face. I've never known a tree to bite. On Wed, Sep 25, 2013 at 4:33 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: STP will never stop being something to bite us all in the ass... -- Espi On Wed, Sep 25, 2013 at 12:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Another not-quite-zombie thread update: After mucho packet capturing, and trying to figure stuff out myself, I called in the cavalry. I sent the packets for a small outbreak to an outside firm that I've used before, and they handed it to their packethead. It is/was an STP problem. Coming from the Cisco switches in the lab - there are several in there that are announcing they are the root bridge, and prod and dev switches ended up fighting. I've explained the problem to the director of engineering, and they've come up with a router and a couple of their own switches, and I'm in the process of migrating their address space/VLANs off of my equipment onto their router/switches. I've set up a /30 between the networks, and will be putting up routes pointing to the new connection as we migrate stuff off. BTW - I came across the following while doing some of the research - it's pretty good: http://www.cisco.com/image/gif/paws/10556/spanning_tree1.swf Kurt On Sun, Sep 22, 2013 at 7:05 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: C-D-A, yep yep. -- Espi On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Well, I do remember reading a long time ago that traffic shouldn't go through more than three switches on a LAN (was that referred to as the diameter? I can't remember) - that pretty much matches the Cisco model of core, distribution and access, as described here, among many other places: http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: Personally speaking, I try to stick to it as well. I've noticed more wonky things the more environments diverge from it. Technically speaking, that should not make sense - but this an unqualified opinion of mine. -- Espi On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith mich...@smithcons.com wrote: I still use it. Violate the rule at your peril. :P From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link Sent: Friday, September 20, 2013 2:07 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm
RE: [NTSysADM] Semi-OT: Network problem
Well, its bark is worse than its bite... From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Micheal Espinola Jr Sent: Wednesday, September 25, 2013 2:45 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem I like it! :-) -- Espi On Wed, Sep 25, 2013 at 1:37 PM, Jonathan Link jonathan.l...@gmail.commailto:jonathan.l...@gmail.com wrote: A better allusion might be a branch swatting us in the face. I've never known a tree to bite. On Wed, Sep 25, 2013 at 4:33 PM, Micheal Espinola Jr michealespin...@gmail.commailto:michealespin...@gmail.com wrote: STP will never stop being something to bite us all in the ass... Snip
RE: [NTSysADM] Semi-OT: Network problem
And bit rot is real. -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Ben Scott Sent: Monday, September 23, 2013 5:48 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem On Sun, Sep 22, 2013 at 9:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Well, I do remember reading a long time ago that traffic shouldn't go through more than three switches on a LAN (was that referred to as the diameter? I can't remember) - that pretty much matches the Cisco model of core, distribution and access, as described here, among many other places: Well, there's a difference between not a best practice and violates the specification. The 5-4-3 rule arises because of how classic Ethernet works. It's a shared medium, meaning all nodes on the network are connected to a common signalling bus. It takes time for a signal to travel through the bus. If the bus is too long (or has too many repeaters), a signal transmitted from one end may not reach the other end before various time windows close, leading to things like missed collisions, missed preambles, and the like. Worse, they won't be retried by the link layer when they should be. (Ethernet does not guarantee delivery, but it does have *some* retry logic built-in.) Silently corrupted payload is even possible. (The 5-4-3 rule is technically only a guideline because Ethernet doesn't care about repeater hops, it cares about signal timings. The only way to truly confirm adherence to all aspects of the specification was with a network analyzer. Those are expensive, so a rule of thumb was needed. 5-4-3 worked for most everything. If your equipment happened to need less margin within the spec, your network could be bigger.) Modern Ethernet puts a transceiver at each end of each cable. Each cable is a point-to-point link, as far as the MAC sublayer is concerned. The switches receive and buffer frames, as opposed to simply amplifying the electrical signal, the way a repeater does. Assuming full duplex, you don't have collisions at all, so you don't have to worry about jam signal propagation. You don't have to worry about preamble degradation, since each switch is transmitting a new preamble. However, if you have to go through 42 switches to get to your destination, that's still bad. Ethernet still does not guarantee delivery, and each link is another chance for a frame to be corrupted and discarded. Each switch also introduces more latency, and a lot of LAN protocols (SMB, I'm looking at you) *hate* latency. -- Ben
Re: [NTSysADM] Semi-OT: Network problem
Well, I do remember reading a long time ago that traffic shouldn't go through more than three switches on a LAN (was that referred to as the diameter? I can't remember) - that pretty much matches the Cisco model of core, distribution and access, as described here, among many other places: http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: Personally speaking, I try to stick to it as well. I've noticed more wonky things the more environments diverge from it. Technically speaking, that should not make sense - but this an unqualified opinion of mine. -- Espi On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith mich...@smithcons.com wrote: I still use it. Violate the rule at your peril. :P From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link Sent: Friday, September 20, 2013 2:07 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
Re: [NTSysADM] Semi-OT: Network problem
C-D-A, yep yep. -- Espi On Sun, Sep 22, 2013 at 6:56 PM, Kurt Buff kurt.b...@gmail.com wrote: Well, I do remember reading a long time ago that traffic shouldn't go through more than three switches on a LAN (was that referred to as the diameter? I can't remember) - that pretty much matches the Cisco model of core, distribution and access, as described here, among many other places: http://searchnetworking.techtarget.com/tip/Core-Distribution-and-Access On Sun, Sep 22, 2013 at 6:33 PM, Micheal Espinola Jr michealespin...@gmail.com wrote: Personally speaking, I try to stick to it as well. I've noticed more wonky things the more environments diverge from it. Technically speaking, that should not make sense - but this an unqualified opinion of mine. -- Espi On Fri, Sep 20, 2013 at 11:59 AM, Michael B. Smith mich...@smithcons.com wrote: I still use it. Violate the rule at your peril. :P From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link Sent: Friday, September 20, 2013 2:07 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
RE: [NTSysADM] Semi-OT: Network problem
I've seen a wire with both ends plugged into a little 5/8 port switch that caused the problem. But it was a long down time, until I found the wire. Mark -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 10:59 AM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
Re: [NTSysADM] Semi-OT: Network problem
On Fri, Sep 20, 2013 at 1:37 PM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule That's for repeaters. If that applies, I'd suggest an alternate approach... ;-) -- Ben
Re: [NTSysADM] Semi-OT: Network problem
On Fri, Sep 20, 2013 at 11:03 AM, Ben Scott mailvor...@gmail.com wrote: On Fri, Sep 20, 2013 at 12:58 PM, Kurt Buff kurt.b...@gmail.com wrote: ... core layer3 switch ... spikes its CPU to 99% during these episodes ... ... Volume of traffic is normal ... CPU spikes on a switch is usually something weird. Normal traffic is handled in the switch ASIC and doesn't touch the CPU at all. Typically it's things like ACLs or policy routing that hit the CPU. Got anything like that going on? ... layer2 loop ... A layer two loop will light up every switch port on the first broadcast packet (or trigger loop detection, which should get logged), so I don't think that's it. No, the configuration of the L3 switch is stupidly simple - I've got all of my servers plugged into it, and all of my distribution switches. It's got 34 of VLANs defined (max-vlans is set to 100), and it's x.x.x.1 on every subnet except the L2 VLAN that terminates on the firewall. I've got 4 x 4-port trunks on it (3 for my VMware boxes and one for the backup machine - the backup machine's trunk is LACP, the others are not, since VMware doesn't support LACP). No particular changes to the config in months (when I set up the LACP trunk for the backup machine. No ACLs, and two routes - a DG and a static to another switch for a lab subnet. Kurt
Re: [NTSysADM] Semi-OT: Network problem
On Fri, Sep 20, 2013 at 12:58 PM, Kurt Buff kurt.b...@gmail.com wrote: ... core layer3 switch ... spikes its CPU to 99% during these episodes ... ... Volume of traffic is normal ... CPU spikes on a switch is usually something weird. Normal traffic is handled in the switch ASIC and doesn't touch the CPU at all. Typically it's things like ACLs or policy routing that hit the CPU. Got anything like that going on? ... layer2 loop ... A layer two loop will light up every switch port on the first broadcast packet (or trigger loop detection, which should get logged), so I don't think that's it. -- Ben
RE: [NTSysADM] Semi-OT: Network problem
We had a bad weekend a couple of month ago when every 24 minutes our LAN would pretty much vanish for about 30-60 seconds. It turns out what truly appeared to be a workgroup switch was actually a hub. One Friday afternoon it decided to show us all why hubs do not belong in networks. -- richard -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 1:12 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem On Fri, Sep 20, 2013 at 11:03 AM, Ben Scott mailvor...@gmail.com wrote: On Fri, Sep 20, 2013 at 12:58 PM, Kurt Buff kurt.b...@gmail.com wrote: ... core layer3 switch ... spikes its CPU to 99% during these episodes ... ... Volume of traffic is normal ... CPU spikes on a switch is usually something weird. Normal traffic is handled in the switch ASIC and doesn't touch the CPU at all. Typically it's things like ACLs or policy routing that hit the CPU. Got anything like that going on? ... layer2 loop ... A layer two loop will light up every switch port on the first broadcast packet (or trigger loop detection, which should get logged), so I don't think that's it. No, the configuration of the L3 switch is stupidly simple - I've got all of my servers plugged into it, and all of my distribution switches. It's got 34 of VLANs defined (max-vlans is set to 100), and it's x.x.x.1 on every subnet except the L2 VLAN that terminates on the firewall. I've got 4 x 4-port trunks on it (3 for my VMware boxes and one for the backup machine - the backup machine's trunk is LACP, the others are not, since VMware doesn't support LACP). No particular changes to the config in months (when I set up the LACP trunk for the backup machine. No ACLs, and two routes - a DG and a static to another switch for a lab subnet. Kurt The information contained in this e-mail, and any attachments hereto, is from The American Society for the Prevention of Cruelty to Animals® (ASPCA®) and is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying or use of the contents of this e-mail, and any attachments hereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify me by reply email and permanently delete the original and any copy of this e-mail and any printout thereof.
Re: [NTSysADM] Semi-OT: Network problem
Yes, that's why on my other switches (Procurve 2510-48), I have set up loop-detect parameters, in addition to spanning tree. I have it lock out the port for 10 minutes. Kurt On Fri, Sep 20, 2013 at 10:53 AM, Reimer, Mark mark.rei...@prairie.edu wrote: I've seen a wire with both ends plugged into a little 5/8 port switch that caused the problem. But it was a long down time, until I found the wire. Mark -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 10:59 AM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
RE: [NTSysADM] Semi-OT: Network problem
I still use it. Violate the rule at your peril. :P From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Jonathan Link Sent: Friday, September 20, 2013 2:07 PM To: ntsysadm@lists.myitforum.com Subject: Re: [NTSysADM] Semi-OT: Network problem Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.commailto:kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.commailto:mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.commailto:listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.commailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.commailto:NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
Re: [NTSysADM] Semi-OT: Network problem
On Fri, Sep 20, 2013 at 2:12 PM, Kurt Buff kurt.b...@gmail.com wrote: No, the configuration of the L3 switch is stupidly simple ... Very odd that you're getting CPU spikes, then. You've done a show log -a on the switch right after the trouble and found nothing helpful, I presume? Have you checked for firmware updates? ... the backup machine's trunk is LACP ... Is the backup machine behaving itself? LACP reconfiguration prolly hits the CPU. STP will hit the CPU. But I'm shooting in the dark, here. I'd call HP support. They know what magic commands to issue to get the switch to cough up relevant debug info. No ACLs, and two routes - a DG and a static to another switch for a lab subnet. I believe routing is done on ASICs with that model anyway. -- Ben
Re: [NTSysADM] Semi-OT: Network problem
No, I figured he was having me on... Kurt On Fri, Sep 20, 2013 at 11:07 AM, Jonathan Link jonathan.l...@gmail.com wrote: Is this the equivalent of Vader saying Your powers are weak, old man to Obi Wan? On Fri, Sep 20, 2013 at 1:55 PM, Kurt Buff kurt.b...@gmail.com wrote: Sigh. Yes, but... The 5-4-3 rule was created when 10BASE5 and 10BASE2 were the only types of Ethernet network available. The rule only applies to shared-access 10 Mbit/s Ethernet backbones. The rule does not apply to switched Ethernet because each port on a switch constitutes a separate collision domain. :) Kurt On Fri, Sep 20, 2013 at 10:37 AM, Michael B. Smith mich...@smithcons.com wrote: http://en.wikipedia.org/wiki/5-4-3_rule -Original Message- From: listsad...@lists.myitforum.com [mailto:listsad...@lists.myitforum.com] On Behalf Of Kurt Buff Sent: Friday, September 20, 2013 12:59 PM To: NTSysADM@lists.myitforum.com Subject: [NTSysADM] Semi-OT: Network problem All, In the past couple of weeks, $work has had a problem with network interruptions - frequent gaps in network connectivity were all contact is lost with servers for brief periods of time (1-2 minutes, usually). I could see the gaps in the graphs on my (very new and incomplete - long story, don't ask) cacti installation. Unfortunately, I've been unable to get cacti to graph CPU utilization for the switches, because they're Procurves, and I couldn't find a working XML file or configuration for that. It's always happened while I've been unavailable, until today. Just now, I was able to show conclusively that our core layer3 switch (Procurve 3400cl-48G), which was hit hardest, spikes its CPU to 99% during these episodes. Volume of traffic is normal - ho huge spikes in that, just normal variation, AFAICT, from the cacti graphs. I haven't had time to see if other switches also spike their CPU, but given the gaps in the graphs, I suspect that's the case. I suspect someone is doing something stupid to create layer2 loop, as we have lots of little 5 and 8 port switches on desktops and in our engineering lab - and in spite of the fact that I've set our core switch as the root of the spanning tree. I'm setting up a box to do a tcpdump in a ring buffer with smallish files so that I can do analysis on them more easily. I'm not a packet analysis guy, though I've done some looking on occasion. Anyone have thoughts on what to look for when I start my analysis? Kurt
Re: [NTSysADM] Semi-OT: Network problem
On Fri, Sep 20, 2013 at 2:59 PM, Michael B. Smith mich...@smithcons.com wrote: I still use it. Violate the rule at your peril. :P Technically speaking, if you're using switches everywhere, you're still following the rule, because every link is its own collision domain. ;-) -- Ben
Re: [NTSysADM] Semi-OT: Network problem
On Fri, Sep 20, 2013 at 3:45 PM, Kurt Buff kurt.b...@gmail.com wrote: You've done a show log -a on the switch right after the trouble and found nothing helpful, I presume? On the 3400cl, 'show log' says the same as 'sho log -a' - nothing of interest. The -a just tells it to include events from before the last reboot. I threw that in in case you had rebooted trying to clear the trouble. Just that the monitor port has a high collision or drop rate once in a while, and that doesn't correlate with the network interruptions.. I wouldn't *think* port mirroring would need the CPU for anything, but I don't actually know. I'm going to take a look at the packets I've captured first, and see what I can, but HP support might well be the answer. Reason I suggest calling support is they're likely to be able to tell you how to tell exactly what is causing the CPU to spike. They might not solve the root cause problem for you, but that's info you need and don't have. -- Ben