Hello, I gave the 2.0.9 git snapshot (71c9484b00b4428ae6c7d7c8eea6d96073683a54) a try tonight, and it seems to fix the issue for me. I’ve not tested on 5.10 though, as the LTS is now 5.15. However, I did test 5.15 with 2.0.8 and I had the same behaviour.
The VM is up for 6h now and everything is stable. Before, the logs were flooded within an hour. On Fri 24 Sep 2021 23:29:25 GMT, Alarig Le Lay wrote: > Hello, > > Now that the IPv6 bug is supposed to be resolved since 5.8, I tried to > upgrade a router from 4.14 to 5.10 > > Bird starts, however while inserting routes to FIB, I have long I/O loop > cycles and at some point bird is unable to keep up. > I already recompiled bird in case of a header change or something like > that, and to switch to a pre-compiled kernel, neither have any effect. > > When bird begins to loose track of itself, I have this kind of messages: > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:44:43 edge04-hostzealot bird: ... > Sep 24 08:44:43 edge04-hostzealot bird: I/O loop cycle took 28703 ms for 1 > events > Sep 24 08:44:43 edge04-hostzealot bird: Kernel dropped some netlink messages, > will resync on next scan. > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists > Sep 24 08:45:50 edge04-hostzealot bird: ... > Sep 24 08:45:51 edge04-hostzealot bird: I/O loop cycle took 36201 ms for 1 > events > > And then ospf begins to flap and routes are re-calculated based on > remaining bgp ones. > Sep 24 08:46:54 edge04-hostzealot bird: Next hop address 185.107.95.180 > resolvable through recursive route for 185.107.92.0/22 > (I have a way more specific route in OSPF) > > I activated the debug, and I can see that bird is re-scanning the entire > kernel table when the “I/O loop” message appears > Sep 24 09:07:30 edge04-hostzealot bird: kernel_grt_ipv4: 1.0.0.0/24: seen > Sep 24 09:07:30 edge04-hostzealot bird: kernel_grt_ipv4: 1.0.4.0/24: seen > > And it tries to insert already inserted routes > Sep 24 09:08:04 edge04-hostzealot bird: kernel_grt_ipv4: 122.76.248.0/23: > installing > Sep 24 09:08:04 edge04-hostzealot bird: Netlink: File exists > > And then OSPF is clearly going down > Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Inactivity timer expired > for nbr 45.91.126.248 on gre4 > Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Neighbor 45.91.126.248 on > gre4 changed state from Full to Down > Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Neighbor 45.91.126.248 on > gre4 removed > > Here are some more detailed logs: https://paste.swordarmor.fr/raw/HX45 > https://paste.swordarmor.fr/raw/oM9s > > This server isn’t the fastest one on the marked, but stuffed enough to > handle full views. And with an older kernel it works very well. > > I have RRs running on 5.10 kernels, so it’s more likely a kernel issue, > but I’m not able to determine if it’s caused by the kernel itself or by > the way bird is using netlink. > > I’m using bird 2.0.8, I didn’t try an older version. > > Thanks a lot, > -- > Alarig Le Lay
