Hi,

I think I was able to reliably trigger the problem sthen@
describes here:
http://marc.info/?l=openbsd-misc&m=133836636125340&w=2

We are seeing a similar problem in production since some time and I
was able to reproduce it with a qemu test setup.

All vms are running:

OpenBSD 5.2-beta (GENERIC) #251: Thu Jun 28 01:30:25 MDT 2012
    [email protected]:/usr/src/sys/arch/i386/compile/GENERIC

The setup:
==========

             +----------+
             | transit  |
             | AS 65001 |
             +--+---+---+
                |em0| .1
                +---+
                  |
     +------------+-----------------+
     | .2   192.168.113.0/24        | .3
   +---+                          +---+
   |em0|                          |em0|
 +-+---+----+                   +-+---+----+
 | b1       |                   | b2       |
 | AS 65002 |                   | AS 65002 |
 +-+---+----+                   +-+---+----+
   |em1| .1                       |em1| .2
   +---+                          +---+
     |       192.168.114.0/24       |
     +---------+--------------------+
               |
             +---+
             |em0| .10
           +-+---+----+
           |  lb1     |
           +-+-----+--+
             |carp1| 192.168.240.1/24
             +-----+


transit announces 65k routes.

For brevity I'm going to ignore b2 from now on since I think the
problem is triggerable without b2 - it's config is symmetric to b1.

[florian@openbsd-b1:~]$ sudo grep -v \# /etc/bgpd.conf | grep -v ^$ 
AS 65002
router-id 192.168.113.2
network 192.168.114.0/24
network 192.168.115.0/24
neighbor 192.168.113.1 {
        descr "openbsd-transit"
        announce self
        local-address 192.168.113.2
        remote-as 65001
}
neighbor 192.168.114.2 {
        descr "openbsd-b2"
        announce all
        local-address 192.168.114.1
        remote-as 65002
}
deny from any
allow from any inet prefixlen 8 - 24
allow from any inet6 prefixlen 16 - 48
allow from any prefix 0.0.0.0/0

[florian@openbsd-b1:~]$ sudo grep -v \# /etc/ospfd.conf  | grep -v ^$ 
router-id 192.168.113.2
fib-update yes
metric 10
redistribute static
redistribute connected
redistribute default set { metric 300 type 2 }
area 0.0.0.0 {
        interface em1 { metric 5 }
}

[root@openbsd-lb1:~]# grep -v \# /etc/ospfd.conf  | grep -v ^$ 
router-id 192.168.114.10
fib-update yes
metric 10
area 0.0.0.0 {
        interface em2 { metric 5 }
        interface em0 { demote carp }
        interface carp1 { passive }
}

------------------------------------------------------------------------

When I flap the ospf route from lb1 by ifconfig em0 down/up on lb1 RES
jumps from 20M to 45M. Doing this often enough I was able to get RES
to 300+M. It's a bit tricky because of the ospf router-dead-time but
this works reliably:

while true 
do 
   echo ifconfig em0 down
   ifconfig em0 down
   sleep 35
   echo ifconfig em0 up
   ifconfig em0 up
   sleep 35
done

Putting a lot of log statements into bgpd I see this:

>From dispatch_rtmsg_addr we come to kroute_insert in kroute.c.

This is true two times:
                        if (h->nexthop.aid == AID_INET &&
                            (ntohl(h->nexthop.v4.s_addr) & mask) == ina)


Once with kr->r of 0.0.0.0/0 and the other with 192.168.113.0/24

Via knexthop_validate(kroute.c) -> send_nexthop_update(bgpd.c) 
  [...] -> nexthop_update (rde_rib.c) we come to prefix_updateall
where 

                if (oldstate == state && state == NEXTHOP_REACH) {

is true. There we basically get a copy of the rib as single imsgs
which kroute.c cannot consume fast enough. The imsgs are queued in the
rde and RES size increases.

So for some reason ospfd touches the route on the external interface
(192.168.113.0/24) and bgpd has to validate all routes.

I'm going to poke ospfd tomorrow. AFAIC bgpd behaves correctly.

Thanks,
Florian

-- 
Intuition is no proof. What concrete evidence do you have that you exist?

Reply via email to