Hi, this is an OVS issue, already discussed:
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html> ... https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html> Official OVS quote: > We'd accept patches to improve OVS's routing table code. It's not > designed to scale to 1,800,000 routes. We'd also take code to suppress > the routing table code in cases where it isn't actually needed, since > it's not always needed. But we can't take a patch to just delete it; > I'm sure you understand. I tried to apply this patch at that time, but was already useless for newer versions: https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin <https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin> Our workaround was to scale VM with 3 vCPU-s, since our average system load is 1.5 for BGP. You can see what is happening: [root@bgp1 ~]# top ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 654 root 10 -10 1284492 1.0g 20276 R 98.0 27.0 2513:01 ovs-vswitchd 16 root 20 0 0 0 0 S 2.0 0.0 24:45.60 ksoftirqd/1 [root@bgp1 ~]# ip route show ... 1.0.0.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 1.0.4.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 1.0.4.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 1.0.5.0/24 via 89.212.47.185 dev t2-v24-ha proto bird Routes being constantly added and deleted: [root@bgp1 ~]# ip monitor ... Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium Deleted 68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 2.16.70.0/23 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 88.221.28.0/22 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 23.50.188.0/22 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 92.122.68.0/22 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 88.221.100.0/22 via 89.212.47.185 dev t2-v24-ha proto bird Deleted 92.123.208.0/22 via 89.212.47.185 dev t2-v24-ha proto bird ..... Regards, saso > On 6 May 2019, at 19:30, Kees Meijs <[email protected] <mailto:[email protected]>> > wrote: > > Hi list, > > We're in the process of replacing Quagga with BIRD but stumble upon a > little problem. > > When device scanning is on (obviously default) our testing machine > completely fills up a CPU core. The culprit isn't BIRD itself but an > Open vSwitch daemon. > > After disabling the device protocol and restarting BIRD, everything goes > back to it's quiet state. > > BIRD (1.6.3-2) and Open vSwitch (2.6.2~pre+git20161223-3) both were > installed as Debian stable packages. > > The configuration is as simple as: > >> # This is a minimal configuration file, which allows the bird daemon >> to start >> # but will not cause anything else to happen. >> # >> # Please refer to the documentation in the bird-doc package or BIRD User's >> # Guide on http://bird.network.cz/ <http://bird.network.cz/> for more >> information on configuring >> BIRD and >> # adding routing protocols. >> >> # Change this into your BIRD router ID. It's a world-wide unique >> identification >> # of your router, usually one of router's IPv4 addresses. >> router id 1.2.3.4; >> >> # The Device protocol is not a real routing protocol. It doesn't >> generate any >> # routes and it only serves as a module for getting information about >> network >> # interfaces from the kernel. >> protocol device { >> } >> >> # The Kernel protocol is not a real routing protocol. Instead of >> communicating >> # with other routers in the network, it performs synchronization of BIRD's >> # routing tables with the OS kernel. >> protocol kernel { >> metric 64; # Use explicit kernel route metric to avoid collisions >> # with non-BIRD routes in the kernel routing table >> import none; >> export all; # Actually insert routes into the kernel routing table >> } >> >> protocol bgp test { >> description "BGP test"; >> local as REDACTED; >> neighbor 1.2.3.4 as REDACTED; >> direct; >> next hop self; >> deterministic med on; >> export none; >> import all; >> } > > Meanwhile log messages such as below arise: > >> bird: Kernel dropped some netlink messages, will resync on next scan. > > For a test I deleted all existing Open vSwitch bridges and the load > dropped again. After adding an empty new bridge, the load spikes again > in an instant. > > This is unexpected behaviour. Maybe it's an implementation problem in > Open vSwitch or maybe in BIRD. Anyway, it should happen I guess. > > Any clues? > > Thanks in advance! > > Regards, > Kees > >
