Hi,

I've recently had a very odd problem. We have two different POPs which are 
connected via Equinex Cross Connect at L2. These two boxes, 2 ASR1002-X version 
15.6(1)S1, have an iBGP session made via the loopbacks interfaces. The routers 
have many ways to reach each-other indirectly and 2 ways to reach each-other 
directly via the ECX. 

So at the moment from each router's perspective they have 2 equal cost paths to 
each-other's loopbacks, so the iBGP session is gonna establish over one of the 
links, depending on how the router picks the link.

The session between these two peers started flapping continuously in an 
interval as big as the hold-timer (15s). In the "show ip bgp neighbor <ip> " I 
saw the "Keepalives are temporarily in throttle due to closed TCP window" 
message. Googling this seemed to be an MTU-related issue (which is weird, since 
this session has been in place for some time now, why the MTU problem now?). 
Just to check, I pinged from both ends with the DF bit set and with packets 
7000B + (we have jumbos) and there were no problems in getting packets through.

When doing a more prolonged ping though we did notice several packets lost at 
sporadic intervals. The MTU size had no bearing on whether or not we would 
experience packet loss or not, so I am assuming something wrong on the circuit 
itself. The interesting discovery though is that that BGP session was going via 
this specific "bad" link. Once we took that link out of the IGP, the session 
established via another link with no issues what-so-ever. 

I couldn't remember if TCP would close the window in the case in which it 
experiences packet loss, and from my research it seems that the TCP window size 
is not affected by packet loss, it's just an indication of the size of the 
buffer at the receiver end. The congestion window size is what actually changes 
in cases of packet losses. Considering I lost in general 1 packet out of 10-15 
or so, let's say that the CWD never got that big, it still wouldn't explain 
that message I saw on the router.

Why would the window size be 0 due to packet loss? I don't get it. There's 
either a problem with that message or with my understanding of how everything 
works. Again, MTU is not at fault as I have tested this.

Any info or insight you might be able to provide would be deeply appreciated.

Thanks,

Vlad.
_______________________________________________
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Reply via email to