Top posting and wrapping this up…

While we shop for other gear that will give us more GigE ports, I did finally 
do a swap last night - on paper, I suppose it's a downgrade, but I put a 3550 
in place of the 3560 and things do indeed look better:

http://i.imgur.com/N7Nanr6.png

The issues I saw between the G2 and the switch were resolved by removing 
shaping on a vlan - apparently that was asking too much of the NPE-G2.

Charles

ps - thanks to "Tim W" for some guidance offlist

On Dec 17, 2012, at 6:01 PM, Charles Sprickman wrote:

> Ugh.  Sent this directly to Tim and not the list.
> 
> My only updates are that I have a 3550 prepped to go out there when we can 
> deal with the downtime and that the packet loss continues during the PPS 
> peaks.  I'm still confused as to why I see the discards on the 7206 side and 
> not the 3560 side (I've linked to some mrtg screencaps below showing both 
> sides of the GigE link between the 7206 and the 3560).
> 
> Thanks,
> 
> Charles
> 
> On Dec 8, 2012, at 12:07 AM, Charles Sprickman wrote:
> 
>> On Dec 7, 2012, at 4:03 AM, [email protected] wrote:
>> 
>>> I would focus on the 3560 device. These switches do not coupe well with 
>>> micro bursts. I would setup graphing on the switch ports to monitor traffic 
>>> levels also monitor the interface controller counters. Also what does the 
>>> show interface summary show, this gives details on rx/tx and queued traffic 
>>> on each interface 
>> 
>> Thanks Tim (and Phil).  I was not aware of the buffer issue, I'd always 
>> thought the 3560 was higher up in the chain than the lowly 3550s we have 
>> scattered about.  We do have a few spare 3550's so replacing this thing is 
>> certainly an easy option.
>> 
>> That said, here's some snippets of the sh int/sh controller output on both 
>> the 7206 and 3560.
>> 
>> 7206 Gi/03:
>> (full output here: http://pastebin.com/cbpy4vkw)
>> 
>> l3-router#sh interfaces gigabitEthernet 0/3             
>> GigabitEthernet0/3 is up, line protocol is up 
>> Hardware is MV64460 Internal MAC, address is 0007.b3c3.f019 (bia 
>> 0007.b3c3.f019)
>> Description: local server subnet (native vlan), trunk to 3560
>> MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec, 
>>    reliability 255/255, txload 19/255, rxload 23/255
>> Encapsulation 802.1Q Virtual LAN, Vlan ID  1., loopback not set
>> Keepalive set (10 sec)
>> Full-duplex, 1000Mb/s, media type is RJ45
>> ?? -->>output flow-control is XON, input flow-control is unsupported
>> 
>> (that's odd, as I don't have this manually configured and it shows up 
>> nowhere else)
>> 
>> ARP type: ARPA, ARP Timeout 04:00:00
>> Last input 00:00:00, output 00:00:00, output hang never
>> Last clearing of "show interface" counters 1d04h
>> Input queue: 0/75/0/15 (size/max/drops/flushes); Total output drops: 9570 
>> <<-- 
>> 
>> (why "0/75/0/15" yet "total" 9570 drops?  what causes an output drop if 
>> there is no speed mismatch and the link is clean?)
>> 
>> Queueing strategy: fifo
>> Output queue: 0/40 (size/max)
>> 5 minute input rate 93407000 bits/sec, 14789 packets/sec
>> 5 minute output rate 76439000 bits/sec, 13517 packets/sec
>>    1017374526 packets input, 1652284061 bytes, 0 no buffer
>>    Received 55861 broadcasts, 0 runts, 0 giants, 0 throttles
>>    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
>>    0 watchdog, 1424775 multicast, 0 pause input
>>    0 input packets with dribble condition detected
>>    999128260 packets output, 2331441042 bytes, 0 underruns
>>    0 output errors, 0 collisions, 0 interface resets
>>    0 unknown protocol drops
>>    0 babbles, 0 late collision, 0 deferred
>>    0 lost carrier, 0 no carrier, 0 pause output
>>    0 output buffer failures, 0 output buffers swapped out
>> 
>> And just a snippet from "sh controllers", the rest is in that pastebin link:
>> 
>> throttled = 0, enabled = 0, disabled = 10
>> reset=4(init=1, restart=3), auto_restart=8
>> tx_underflow = 0, tx_overflow = 0,  tx_end_count = 1619071635 <<--???
>> 
>> (including this as I don't know what "tx_end_count" is and it's pretty high 
>> and climbing - right now it's at 1774057354 and the interface snapshots in 
>> these pastebin posts were taken around 8 hours earlier)
>> 
>> rx_nobuffer = 0, rx_overrun = 0
>> rx_no_descriptors = 0,  rx_interrupt_count = 875592461 
>> rx_crc_error = 0, rx_too_big = 0, rx_resource_error = 0
>> rx_sop_eop_error = 0
>> 
>> The paste also includes "sh interface switching" info.
>> 
>> 
>> On the 3560's port that trunks back to the 7206 I have some data as well, 
>> and I'm including highlights.  http://pastebin.com/T9R7qgdz
>> 
>> GigabitEthernet0/1 is up, line protocol is up (connected) 
>> Hardware is Gigabit Ethernet, address is 0019.062a.1d81 (bia 0019.062a.1d81)
>> Description: to router
>> MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
>>    reliability 255/255, txload 17/255, rxload 13/255
>> Encapsulation ARPA, loopback not set
>> Keepalive not set
>> Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
>> input flow-control is off, output flow-control is unsupported 
>> ARP type: ARPA, ARP Timeout 04:00:00
>> Last input 00:00:25, output 00:00:00, output hang never
>> Last clearing of "show interface" counters 1d02h
>> Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
>> Queueing strategy: fifo
>> Output queue: 0/40 (size/max)
>> 5 minute input rate 53380000 bits/sec, 8601 packets/sec
>> 5 minute output rate 69344000 bits/sec, 10519 packets/sec
>>    759096424 packets input, 576528174343 bytes, 0 no buffer
>>    Received 80421 broadcasts (33239 multicasts)
>>    0 runts, 0 giants, 0 throttles
>>    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
>>    0 watchdog, 33239 multicast, 0 pause input
>>    0 input packets with dribble condition detected
>>   887741501 packets output, 682089634960 bytes, 0 underruns
>>    0 output errors, 0 collisions, 0 interface resets
>>    0 babbles, 0 late collision, 0 deferred
>>    0 lost carrier, 0 no carrier, 0 PAUSE output
>>    0 output buffer failures, 0 output buffers swapped out
>> 
>> No sign of drops here, and also note it says no flow control is enabled in 
>> and is unsupported outbound, so not sure why the 7206 is indicated flow 
>> control is enabled.
>> 
>> Some "sh buffers" info (more output in the pastebin link):
>> 
>> None of the "small", "medium", "very large", etc. buffer stats show any 
>> failures, but interface buffers for a few interfaces show drops (at least 
>> I'm guessing that's what a "fallback" is):
>> 
>> Syslog ED Pool buffers, 600 bytes (total 150, permanent 150):
>>    118 in free list (150 min, 150 max allowed)
>>    35588 hits, 0 misses
>> RxQFB buffers, 2040 bytes (total 300, permanent 300):
>>    296 in free list (0 min, 300 max allowed)
>>    605798 hits, 0 misses
>> RxQ1 buffers, 2040 bytes (total 128, permanent 128):
>>    1 in free list (0 min, 128 max allowed)
>>    11937884 hits, 96720 fallbacks
>> RxQ2 buffers, 2040 bytes (total 12, permanent 12):
>>    0 in free list (0 min, 12 max allowed)
>>    12 hits, 0 fallbacks, 0 trims, 0 created
>>    0 failures (0 no memory)
>> RxQ3 buffers, 2040 bytes (total 128, permanent 128):
>>    1 in free list (0 min, 128 max allowed)
>>    17394929 hits, 382890 fallbacks
>> RxQ4 buffers, 2040 bytes (total 64, permanent 64):
>>    1 in free list (0 min, 64 max allowed)
>>    721294 hits, 11285 fallbacks
>> ...
>> "sh platform port-asic stats drop"
>> 
>> Port  0 TxQueue Drop Stats: 0
>> Port  1 TxQueue Drop Stats: 0
>> Port  2 TxQueue Drop Stats: 0
>> Port  3 TxQueue Drop Stats: 464306
>> Port  4 TxQueue Drop Stats: 424
>> Port  5 TxQueue Drop Stats: 8
>> Port  6 TxQueue Drop Stats: 13954
>> Port  7 TxQueue Drop Stats: 56
>> Port  8 TxQueue Drop Stats: 4226
>> ...
>> Port 24 TxQueue Drop Stats: 0
>> Port 25 TxQueue Drop Stats: 0
>> 
>> (not even sure how the ports map here - if 0 and 1 are GigE, no drops there 
>> and if 24 and 25 are GigE, same deal).
>> 
>> I can also confirm that what I am able to measure on a host running 
>> smokeping shows a definite correlation between packet loss through at least 
>> the switch (the host that's running smokeping is connected to the switch) 
>> and an increase in packet/second.  This graph tells the story better than I 
>> can describe:
>> 
>> http://imgur.com/a/Wllr7/all
>> 
>> Note that the discards are on the 7206 side, but not the 3560.
>> 
>> I have more data, and some maddeningly inconclusive smokeping graphs that 
>> don't confirm any real patterns - I see loss on targets beyond one transit 
>> provider at times, on the other transit provider at times but I also have 
>> totally lossless graphs for each as well.
>> 
>> If there's any more data I can provide, let me know.
>> 
>> I'm getting a 3550 ready just because I have one...
>> 
>> Thanks again,
>> 
>> Charles
>> 
>> 
>> 
>>> Tim
>>> 
>>> 
>>> 
>>> 
>>> On 7 Dec 2012, at 00:43, Charles Sprickman <[email protected]> wrote:
>>> 
>>>> I'm having a tough time finding where else to dig for the source of
>>>> packet loss on what seems like a fairly lightly-loaded network.  We
>>>> have a very simple setup with a 7206/NPE-G2.
>>>> 
>>>>                      ___________  dot1q           dot1q
>>>> Transit1(Gi0/1)-- -----|         |  trunk  ________ trunk
>>>>                      |   7206  |---------| 3560  |------- MetroE
>>>> DSL Provider (Gi0/2)---|         | (Gi0/3  |_______|        (Gi0/2)
>>>>                      |_________| to Gi0/1) |  |  |  
>>>>                                           |  |   \
>>>>                                           |  |    \
>>>>                                         Transit2    Servers
>>>>                                         (fa0/13,14)  (fa0/1-12)
>>>> 
>>>> Our aggregate usage is under 300Mb/s.  The MetroE connection peaks
>>>> at about 120Mb/s.  The DSL link peaks at around 110Mb/s.
>>>> 
>>>> DSL subs come in as a VLAN per customer, and get a subinterface per
>>>> customer.  Each subinterface uses "ip unnumbered loopback X" where
>>>> "X" is the customer's gateway.
>>>> 
>>>> MetroE subs also come in one per VLAN and terminate on numbered
>>>> subinterfaces.  The VLANs are trunked through the switch.
>>>> 
>>>> 3560 is setup in standard "router on a stick" - subinterfaces are
>>>> created on Gi0/3 on the 7206 for fa0/13-14 and a few other small
>>>> vlans for a handful of servers (less than 15Mb/s peak).  Native vlan
>>>> is unused.
>>>> 
>>>> CPU usage on the G2 averages about 30% at peak times of the day.
>>>> Every link here runs clean as far as "sh int" can show me.
>>>> 
>>>> During peak traffic times however, we start seeing some light packet
>>>> loss from the server vlans to anything reached via Transit1 and to
>>>> DSL circuits (hard to prove it's not the backhaul or customer line
>>>> usage there however).  At the same time, a ping running to anyone
>>>> off the metro ethernet circuit is clean, as is anything reached via
>>>> Transit2.  There appears to be no loss from MetroE customers to
>>>> Transit1 destinations nor from DSL clients to Transit1.  I just
>>>> added a bunch more targets in each area mapped out above to
>>>> smokeping to try and narrow this down, but in the meantime, what
>>>> else can I look at?  As noted, there's nothing alarming in any
>>>> interface counters here, but the pattern does seem to be that
>>>> anything in any of the server vlans traversing the router/switch
>>>> trunk and heading out any other GigE interface on the router shows
>>>> loss, but traffic from the server vlan to anything that traverses
>>>> the router/switch trunk and then turns back around and heads out
>>>> another port on the 3560 does not show loss.
>>>> 
>>>> I don't have enough hard data yet to point any fingers, but what are
>>>> some of the more low-level items to look at on the 7206 and the
>>>> 3560?
>>>> 
>>>> Thanks,
>>>> 
>>>> Charles
>>>> _______________________________________________
>>>> cisco-nsp mailing list  [email protected]
>>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> 
> 
> 
> _______________________________________________
> cisco-nsp mailing list  [email protected]
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/


_______________________________________________
cisco-nsp mailing list  [email protected]
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Reply via email to