On Fri, 6 Nov 2020 13:53:58 +0100 Jesper Dangaard Brouer <[email protected]> wrote:
> [...] > > > > > > Could this be related to netlink? I have gobgpd running on these > > > routers, which injects routes via netlink. > > > But the churn rate during the tests is very minimal, maybe 30 - 40 > > > routes every second. > > Yes, this could be related. The internal data-structure for FIB > lookups is a fibtrie which is a compressed patricia tree, related to > radix tree idea. Thus, I can imagine that the kernel have to > rebuild/rebalance the tree with all these updates. Reading the kernel code. The IPv4 fib_trie code is very well tuned, fully RCU-ified, meaning read-side is lock-free. The resize() function code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses too much time. And the update looks lockfree. The IPv6 update looks more scary, as it seems to take a "bh" spinlock that can block softirq from running code in net/ipv6/ip6_fib.c (spin_lock_bh(&f6i->fib6_table->tb6_lock). Have you tried to use 'perf record' to observe that is happening on the system while these latency incidents happen? (let me know if you want some cmdline hints) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer _______________________________________________ Bloat mailing list [email protected] https://lists.bufferbloat.net/listinfo/bloat
