Tried to enable rib-sharding on several routers in last weeks and got bunch of problems. First, PE router with rib-sharding was losing connectivity to indirect routes after every MPLS LSP autobandwidth adjustment. Let's PE-A has a static route for X.X.X.X/29 pointing to IP Y.Y.Y.1 reachable via connected interface with IP Y.Y.Y.0/31. PE-A advertises X.X.X.X/29 with next-hop Y.Y.Y.1, and Y.Y.Y.0/31 with next-hop Z.Z.Z.Z/32 from lo0 address used as iBGP session source. PE-B resolves Z.Z.Z.Z/32 via RSVP LSP with label L0, and X.X.X.X/29 is resolved via Y.Y.Y.1 via Z.Z.Z.Z/32 to the same label L0. When regular autobandwidth adjustment happens, PE-B calculates and signals the new path with label L1 using make-before-brake, and then switches traffic to the new path by updating the label from L0 to L1 for prefixes that are using it. It turns out that the label is updated for Z.Z.Z.Z/32, Y.Y.Y.0/31, but not for X.X.X.X/29. After hold-down timer expires, PE-B signals deletion of path with label L0, but still uses L0 for X.X.X.X/29 and traffic is blackholed because downstream router has already deleted the label. Disabling rib-sharding on PE-B solved this issue right away. Next, a memory leak happened on a non-RR router, eating memory from 17 to 95% in three weeks. After disabling rib-sharding memory usage is at 14% so far. And finally, two regional route-reflectors without rib-sharding peered with central RRs with sharding enabled, got to 100% CPU utilization right after BGP sessions were established. It caused very slow route updates with intermittent connectivity even for routes that haven't changed. Changes were reverted on one of these routers, and another one was running at 100% RE CPU until rib-sharding was disabled on one of central RRs. After disabling rib-sharding on one central RR, CPU on the peered regional RR dropped to 30-40% but still was higher than usual. Only when rib-sharding was disabled on the second central RR, CPU utilization returned to normal 20-25%.

YMMV, but I don't think we're going to try this feature again in the foreseeable future.

Kind regards,
Andrey

Luca Salvatore писал(а) 2024-06-26 15:18:
For what it's worth, we're happily running rib-sharding on many MX10K
devices on 22.2R3-S2.
NSR is fine and we haven't had any issues

On Sun, Jun 2, 2024 at 10:26 PM Gustavo Santos via juniper-nsp
<[email protected]> wrote:

I tried it again on JUNOS 21.4R3-S3.4 hit some bugs that crashed rpd
daemon and I gave up.

We will try it again later this year. If update threading /
rib-sharding
works as expected it will be better than having non stop routing
running.

Last time we had an issue caused by bgp routing update, it tooks
about 50
minutes to advertise all needed routes to one of the transit
providers,
because the time it takes to send full routing tables feed to remote
peers.

Em sex., 10 de mai. de 2024 às 16:45, Andrey Kostin via juniper-nsp
<
[email protected]> escreveu:

Hi juniper-nsp,

Just hit exactly the same issue as described in the message found
in the
list archives:

Gustavo Santos
Mon Jan 4 15:13:18 EST 2021

Hi,

We got another MX10003 and we are updating it before get in
production.
Reading the 19.4R3 release notes, we noticed that two
features update-threading  and  rib-sharding and I really liked
what it
"promises" as faster BGP updates .

But there is a catch. We can't use this new feature with non-stop
routing
enabled.

The question is , are these features worth the non-stop routing
loss?

Regards
"
bgp {
##
## Warning: Can't be configured together with routing-options
nonstop-routing
##
rib-sharding;
##
## Warning: Update threading can't be configured together
with
routing-options nonstop-routing
##
update-threading;
}
"

That message seems didn't get any response.
However, I found an explanation at the bottom the page:



https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/rib-sharding-edit-protocols-bgp.html
Support for NSR with sharding introduced in Junos OS Release 22.2.
BGP sharding supports IPv4, IPv6, L3VPN and BGP-LU from Junos OS
Release
20.4R1.

Still need to test and confirm on this platform, but on another
router
it already works.

--
Kind regards,
Andrey

_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp

Reply via email to