Hi,
Is there any suspicious messages logged at that moment ?
There are some PRs related to krt queue stuck, so probably you want to
upgrade to 10.4R10 or investigate this issue with jtac.
https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR722890
On 18.07.2012 2:03, Tim Vollebregt wrote:
Hi All,
This morning during a maintenance I experienced the route stall bug Richard
mentioned a few times already on j-nsp.
Hardware kit:
-MX480 with SCB (non-e)
-2 x RE-S-1800x4
-4 x MPC 3D 16x 10GE
Software version: 10.4R8.5
During this maintenance I was placing 2 new routing engines into the router,
replacing the 'old' RE-S-2000. This router is pushing a lot of traffic and
receiving 14 x full BGP tables from eBGP peers/1 RR session to it's
'mate'/several iBGP peers with partial tables
After replacing the RE's the FPC's initialized and BGP sessions were being
established it took quite some time before the RIB was completely filled. After
checking some hosts I came to the conclusion that there were unreachable
destinations however the RIB was looking fine.
When checking the FIB by issuing command: show route forwarding-table summary I
saw that there were only 11K prefixes pushed to the FIB and it was hanging.
As I was aware of the bug I waited for some time. And it eventually took about
30 minutes to fill the FIB with 414K prefixes. During these 30 minutes a lot of
destinations were unreachable and traffic was being blackholed as exchanging
RIB with peers was fine.
As there was still some time left in the maintenance window and I really wanted
to have some workaround for dealing with this bug I did the following.
I deactivated all eBGP peer groups and did a switchover to the other routing
engine. When the PFC's were initialized the router started building it's iBGP
sessions towards the core routers, and it's RR session (full table).
This worked out quite well, the FIB was being filled with the full table within
5 minutes. Afterwards I activated all eBGP peergroups again and monitored the
FIB, eventually it took about 30 minutes to fill the FIB with the correct
next-hops. But this time the blackholing was just for a limited amount of time.
It seems this bug is there since release 10.0 (MPC), and there doesn't seem to
be a fix yet. Does anyone have more information about it, PR number etc?
IMHO this is a really bad one, and can be a showstopper in some cases.
Thanks for your time.
_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp