On 11.01.24 21:34, Thomas Steen Rasmussen via Bird-users wrote:
Hello :)
Yesterday I had one of my FreeBSD routers stop forwarding because it
ran out of mbuf clusters. It usually operates far from the limit, but
there is (was) something leaking mbuf clusters bad, and I suspect it
might be bird.
----
Some background:
Due to a missing/misconfigured kernel export filter bird was
repeatedly trying to export some routes to the kernel which the kernel
already knew (from statically configured blackhole routes). So these
errors have been repeating in the logs for some time (more than a year):
Jan 11 19:09:04 dgncr2a bird[30963]: KRT: Error sending route
2a09:94c0::/29 to kernel: File exists
Jan 11 19:10:04 dgncr2a syslogd: last message repeated 1 times
Jan 11 19:10:04 dgncr2a bird[30963]: KRT: Error sending route
85.209.116.0/22 to kernel: File exists
Jan 11 19:11:04 dgncr2a syslogd: last message repeated 1 times
Over the holidays I upgraded from bird 2.0.9 to bird 2.14, as well as
upgrading FreeBSD from 13-STABLE-384a885111ad to
13-STABLE-2cbd132986a7. I suspect one of these two changes made this
problem appear. I made no changes to bird or router config other than
the upgrades.
----
The mbuf cluster leak was pretty bad, like 8-10 clusters per second at
a pretty steady rate. The kern.ipc.nmbclusters limit on my routers was
around 2 million and I raised it to 4 million now.
Since I had no idea what was causing the leak and I was desperate for
a fix I at one point tried adding the missing kernel export filter (as
to at least silence the noisy warnings in the logs), and imagine my
surprise when the mbuf cluster leak stopped.
I tried removing the filers again, the leak started again, and stopped
again when I re-added the filters. It appears some combination of bird
2.14 and exporting routes already found in the kernel means leaking
mbuf clusters like crazy.
I have no idea if this is a bird or a freebsd problem but I have to
start somewhere :) I can to some extent test stuff, but the routers
are in production (BGP with 1 ebgp and 1 ibgp peer and no full table)
so nothing too wild.
Can you please also open a FreeBSD PR for this. It looks like bird is
only the reproducer for a kernel bug. Can you periodically record the
`vmstat -m` and `netstat -m` output as the resources are leaked?