Re: [j-nsp] route BGP stall bug

Tima Maryin Wed, 18 Jul 2012 02:43:53 -0700

Hi,


Is there any suspicious messages logged at that moment ?

There are some PRs related to krt queue stuck, so probably you want toupgrade to 10.4R10 or investigate this issue with jtac.


https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR722890


On 18.07.2012 2:03, Tim Vollebregt wrote:

Hi All,

This morning during a maintenance I experienced the route stall bug Richard 
mentioned a few times already on j-nsp.

Hardware kit:
-MX480 with SCB (non-e)
-2 x RE-S-1800x4
-4 x MPC 3D 16x 10GE
Software version: 10.4R8.5
During this maintenance I was placing 2 new routing engines into the router, 
replacing the 'old' RE-S-2000. This router is pushing a lot of traffic and 
receiving 14 x full BGP tables from eBGP peers/1 RR session to it's 
'mate'/several iBGP peers with partial tables

After replacing the RE's the FPC's initialized and BGP sessions were being 
established it took quite some time before the RIB was completely filled. After 
checking some hosts I came to the conclusion that there were unreachable 
destinations however the RIB was looking fine.

When checking the FIB by issuing command: show route forwarding-table summary I 
saw that there were only 11K prefixes pushed to the FIB and it was hanging.
As I was aware of the bug I waited for some time. And it eventually took about 
30 minutes to fill the FIB with 414K prefixes. During these 30 minutes a lot of 
destinations were unreachable and traffic was being blackholed as exchanging 
RIB with peers was fine.

As there was still some time left in the maintenance window and I really wanted 
to have some workaround for dealing with this bug I did the following.
I deactivated all eBGP peer groups and did a switchover to the other routing 
engine. When the PFC's were initialized the router started building it's iBGP 
sessions towards the core routers, and it's RR session (full table).

This worked out quite well, the FIB was being filled with the full table within 
5 minutes. Afterwards I activated all eBGP peergroups again and monitored the 
FIB, eventually it took about 30 minutes to fill the FIB with the correct 
next-hops. But this time the blackholing was just for a limited amount of time.

It seems this bug is there since release 10.0 (MPC), and there doesn't seem to 
be a fix yet. Does anyone have more information about it, PR number etc?

IMHO this is a really bad one, and can be a showstopper in some cases.

Thanks for your time.


_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] route BGP stall bug

Reply via email to