This afternoon I stumbled across a problem with a LDP session between a 7613 and a 7201. Actually both LDP and iBGP were flapping every 10 seconds or so. I had both interfaces configured for MPLS, LDP, IS-IS (with AUTH and BFD though BFD isn't enabled on the interface itself yet) with an interface MTU of 9000 and CLNS MTU of 1496. Nothing too fancy. The systems as a whole are configured with MPLS graceful-restart, LDP, no mpls ip propagate-ttl, and LDP router-ID on a loopback:

# 7201
mpls label protocol ldp
no mpls ip propagate-ttl
mpls ldp graceful-restart
mpls ldp router-id Loopback0 force

# 7613
mls mpls tunnel-recir
mpls traffic-eng tunnels
mpls ldp graceful-restart
no mpls ip propagate-ttl
mpls label protocol ldp
mpls ldp router-id Loopback0 force

This morning at 7:05 the router stopped responding to SNMP queries for about 15m. The load was about 13 before. Cacti shows the load doubling in the 10m prior to the 15m of nothing. When it came back the load was just shy of 50 and stayed there for about 30m. After that it stayed at around 30-35 for the next 7.5hrs before I noticed the BGP flapping issue and shutdown the peer for troubleshooting. The load dropped back to around 16, higher than it was before the hiccup this morning. I'm at a loss to adequately explain why the load has been so jacked. I think the 30-35 load was because BGP flapping and the slightly higher load now is due to the LDP flapping issue. That's my best guess.

Anyone know how to troubleshoot a LDP neighbor flapping issue? The 7613 is logging this:

730278: Mar 4 20:43:48.696 CST: LDP GR: Received FT Sess TLV from 10.64.0.34:0 (fl 0x1, rs 0x0, rconn 0, rcov 120000) 730279: Mar 4 20:43:48.696 CST: LDP GR: MFI cutover wait delay = 600000, Forwarding State Hold Timer = 600000 730280: Mar 4 20:43:48.696 CST: LDP GR: searching for down nbr record (10.64.0.34:0, 10.64.0.178) 730281: Mar 4 20:43:48.696 CST: LDP GR: Added FT Sess TLV (Rconn 120000, Rcov 0) to INIT msg to 10.64.0.34:0

The 7201 is logging this:

054705: Mar  5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: lost
054706: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: created [1 total] 054707: Mar 5 00:28:19 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. 3): interrupted--recovery pending 054708: Mar 5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: bindings retained 054709: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: state change (None -> Reconnect-Wait) 054710: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: reconnect timer started [120000 msecs] 054711: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: added to bindings task queue [1 entries] 054712: Mar 5 00:28:19 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0 (0) is DOWN (Received error notification from peer: Shut down)

054713: Mar 5 00:28:25.923 CST: LDP GR: searching for down nbr record (10.64.0.20:0, 10.64.0.179) 054714: Mar 5 00:28:25.923 CST: LDP GR: search for down nbr record (10.64.0.20:0, 10.64.0.179) returned 10.64.0.20:0 054715: Mar 5 00:28:25.923 CST: LDP GR: Added FT Sess TLV (Rconn 0, Rcov 120000) to INIT msg to 10.64.0.20:0 054716: Mar 5 00:28:25.947 CST: LDP GR: Received FT Sess TLV from 10.64.0.20:0 (fl 0x1, rs 0x0, rconn 120000, rcov 0) 054717: Mar 5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: established 054718: Mar 5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: found down nbr 10.64.0.20:0 054719: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: reconnect timer stopped 054720: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: state change (Reconnect-Wait -> Recovering) 054721: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: recovery timer started [1 msecs] 054722: Mar 5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. 4): starting graceful recovery 054723: Mar 5 00:28:25 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0 (4) is UP 054724: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: recovery timer expired 054725: Mar 5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. 4): completed graceful recovery 054726: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: destroying record [0 left] 054727: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: state change (Recovering -> Delete-Wait)

054728: Mar 5 00:28:28.091 CST: LDP GR: Tagcon querying for up to 12 bindings update tasks [table 0] 054729: Mar 5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: requesting bindings DEL for {10.64.0.20:0, 3} 054730: Mar 5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: removed from bindings task queue [0 entries] 054731: Mar 5 00:28:28.091 CST: LDP GR: Requesting 1 bindings update tasks [0 left in queue]

10.64.0.20 is a loopback on the 7613 and 10.64.0.34 is a loopback on the 7201.

I do have some interface errors which I also can't explain. They do not appear to be incrementing though. 7613:

GigabitEthernet9/1 is up, line protocol is up (connected)
Hardware is C6k 1000Mb 802.3, address is 001a.3063.0a80 (bia 001a.3063.0a80)
  Description: TO 2821-2.dc Gi0/0
  Internet address is 10.64.0.179/31
  MTU 9000 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s
  input flow-control is off, output flow-control is off
  Clock mode is auto
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:02, output 00:00:00, output hang never
  Last clearing of "show interface" counters never
Input queue: 0/75/1936665/7581 (size/max/drops/flushes); Total output drops: 4
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 49000 bits/sec, 17 packets/sec
  5 minute output rate 56000 bits/sec, 24 packets/sec
L2 Switched: ucast: 52903876 pkt, 3771470311 bytes - mcast: 15056043 pkt, 1653756471 bytes L3 in Switched: ucast: 80170438 pkt, 12709078926 bytes - mcast: 0 pkt, 0 bytes mcast L3 out Switched: ucast: 185161821 pkt, 36022953056 bytes mcast: 0 pkt, 0 bytes
     150040994 packets input, 30087625055 bytes, 0 no buffer
     Received 15660647 broadcasts (0 IP multicasts)
     30 runts, 4247159 giants, 0 throttles
     1929071 input errors, 68 CRC, 0 frame, 13 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     257650143 packets output, 64726258058 bytes, 0 underruns
     2 output errors, 0 collisions, 2 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

7201:
GigabitEthernet0/0 is up, line protocol is up
Hardware is MV64460 Internal MAC, address is 0023.5ee9.ac1b (bia 0023.5ee9.ac1b)
  Description: TO 7613-2.clr Gi9/1
  Internet address is 10.64.0.178/31
  MTU 9000 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is RJ45
  output flow-control is XON, input flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:00, output 00:00:00, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/3951/0 (size/max/drops/flushes); Total output drops: 6
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 45000 bits/sec, 19 packets/sec
  5 minute output rate 64000 bits/sec, 13 packets/sec
     51466122 packets input, 1916487584 bytes, 0 no buffer
     Received 1891956 broadcasts, 0 runts, 0 giants, 0 throttles
     5 input errors, 0 CRC, 0 frame, 0 overrun, 5 ignored
     0 watchdog, 2247902 multicast, 0 pause input
     0 input packets with dribble condition detected
     32927369 packets output, 1549013167 bytes, 0 underruns
     8 output errors, 0 collisions, 1 interface resets
     23 unknown protocol drops
     23 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     8 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out


Any thoughts as to what's going on here? I can't tell for certain which of the 2 routers is causing LDP and BGP to drop. Knowing that would help me narrow my troubleshooting focus. The 7600 is running SRB1 and the 7201 is running 12.4(15)T7.

Thanks
 Justin

_______________________________________________
cisco-nsp mailing list  [email protected]
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Reply via email to