Hi,
I think I was able to reliably trigger the problem sthen@
describes here:
http://marc.info/?l=openbsd-misc&m=133836636125340&w=2
We are seeing a similar problem in production since some time and I
was able to reproduce it with a qemu test setup.
All vms are running:
OpenBSD 5.2-beta (GENERIC) #251: Thu Jun 28 01:30:25 MDT 2012
[email protected]:/usr/src/sys/arch/i386/compile/GENERIC
The setup:
==========
+----------+
| transit |
| AS 65001 |
+--+---+---+
|em0| .1
+---+
|
+------------+-----------------+
| .2 192.168.113.0/24 | .3
+---+ +---+
|em0| |em0|
+-+---+----+ +-+---+----+
| b1 | | b2 |
| AS 65002 | | AS 65002 |
+-+---+----+ +-+---+----+
|em1| .1 |em1| .2
+---+ +---+
| 192.168.114.0/24 |
+---------+--------------------+
|
+---+
|em0| .10
+-+---+----+
| lb1 |
+-+-----+--+
|carp1| 192.168.240.1/24
+-----+
transit announces 65k routes.
For brevity I'm going to ignore b2 from now on since I think the
problem is triggerable without b2 - it's config is symmetric to b1.
[florian@openbsd-b1:~]$ sudo grep -v \# /etc/bgpd.conf | grep -v ^$
AS 65002
router-id 192.168.113.2
network 192.168.114.0/24
network 192.168.115.0/24
neighbor 192.168.113.1 {
descr "openbsd-transit"
announce self
local-address 192.168.113.2
remote-as 65001
}
neighbor 192.168.114.2 {
descr "openbsd-b2"
announce all
local-address 192.168.114.1
remote-as 65002
}
deny from any
allow from any inet prefixlen 8 - 24
allow from any inet6 prefixlen 16 - 48
allow from any prefix 0.0.0.0/0
[florian@openbsd-b1:~]$ sudo grep -v \# /etc/ospfd.conf | grep -v ^$
router-id 192.168.113.2
fib-update yes
metric 10
redistribute static
redistribute connected
redistribute default set { metric 300 type 2 }
area 0.0.0.0 {
interface em1 { metric 5 }
}
[root@openbsd-lb1:~]# grep -v \# /etc/ospfd.conf | grep -v ^$
router-id 192.168.114.10
fib-update yes
metric 10
area 0.0.0.0 {
interface em2 { metric 5 }
interface em0 { demote carp }
interface carp1 { passive }
}
------------------------------------------------------------------------
When I flap the ospf route from lb1 by ifconfig em0 down/up on lb1 RES
jumps from 20M to 45M. Doing this often enough I was able to get RES
to 300+M. It's a bit tricky because of the ospf router-dead-time but
this works reliably:
while true
do
echo ifconfig em0 down
ifconfig em0 down
sleep 35
echo ifconfig em0 up
ifconfig em0 up
sleep 35
done
Putting a lot of log statements into bgpd I see this:
>From dispatch_rtmsg_addr we come to kroute_insert in kroute.c.
This is true two times:
if (h->nexthop.aid == AID_INET &&
(ntohl(h->nexthop.v4.s_addr) & mask) == ina)
Once with kr->r of 0.0.0.0/0 and the other with 192.168.113.0/24
Via knexthop_validate(kroute.c) -> send_nexthop_update(bgpd.c)
[...] -> nexthop_update (rde_rib.c) we come to prefix_updateall
where
if (oldstate == state && state == NEXTHOP_REACH) {
is true. There we basically get a copy of the rib as single imsgs
which kroute.c cannot consume fast enough. The imsgs are queued in the
rde and RES size increases.
So for some reason ospfd touches the route on the external interface
(192.168.113.0/24) and bgpd has to validate all routes.
I'm going to poke ospfd tomorrow. AFAIC bgpd behaves correctly.
Thanks,
Florian
--
Intuition is no proof. What concrete evidence do you have that you exist?