On Thu, Mar 31, 2022 at 09:06:05PM +0200, Laurent CARON wrote:
> Le 29/03/2022 à 12:10, Claudio Jeker a écrit :
> > I doubt it is the filters. You run into some sort of memory leak. Please
> > monitor 'bgpctl show rib mem' output. Also check ps aux | grep bgpd output
> > to see why and when the memory starts to go up.
> > With that information it may be possible to figure out where this leak
> > sits and how to fix it.
> > 
> > Cheers
> 
> 
> Hi Claudio,
> 
> Please find the output of 'bgpctl show rib mem' just 1 minute before the
> crash:
> 
> cat 2022-03-30::15:07:01.mem
> RDE memory statistics
>     909685 IPv4 unicast network entries using 34.7M of memory
>     272248 IPv6 unicast network entries using 14.5M of memory
>    2363169 rib entries using 144M of memory
>   14616410 prefix entries using 1.7G of memory
>    1539060 BGP path attribute entries using 106M of memory
>            and holding 14616410 references
>     635275 BGP AS-PATH attribute entries using 33.7M of memory
>            and holding 1539060 references
>      47399 entries for 681150 BGP communities using 15.1M of memory
>            and holding 14616410 references
>      22139 BGP attributes entries using 865K of memory
>            and holding 3436885 references
>      22138 BGP attributes using 175K of memory
>     270121 as-set elements in 249193 tables using 9.7M of memory
>     452138 prefix-set elements using 19.0M of memory
> RIB using 2.1G of memory
> Sets using 28.7M of memory
> 
> RDE hash statistics
>         path hash: size 131072, 1539060 entries
>             min 0 max 31 avg/std-dev = 11.742/3.623
>         aspath hash: size 131072, 635275 entries
>             min 0 max 16 avg/std-dev = 4.847/2.123
>         comm hash: size 16384, 47399 entries
>             min 0 max 12 avg/std-dev = 2.893/1.622
>         attr hash: size 16384, 22139 entries
>             min 0 max 8 avg/std-dev = 1.351/1.084

The numbers look reasonable with maybe the exception of prefix and BGP
path attrs. Unless this system is pushing or pulling lots of full feeds to
peers I would not expect such a high number of prefixes. Also the number
of path attributes is high but that could again be reasonable if many
different full feeds are involved.
 
> Here is the output of 'ps aux | grep bgp' one minute before the crash:
> 
> _bgpd    25479 100.1 40.1 33547416 33620192 ??  Rp/2   Tue09AM 1755:38.49
> bgpd: route
> _bgpd     8696 31.6  0.0 15800 13240 ??  Sp     Tue09AM  626:35.66 bgpd:
> sessio
> _bgpd    46603  0.0  0.0 22728 25876 ??  Ip     Tue09AM    1:29.11 bgpd: rtr
> en
> root     94644  0.0  0.0   196   916 ??  Rp/3    3:07PM    0:00.00 grep bgpd
 
Interesting, the size is around 3GB which is somewhat reasonable.
What surprises me is the high CPU load and time spent in both the RDE and
SE. One of my core routers running since last September has about the same
CPU usage that your box collected in a few days. It seems that there is a
lot of churn.

I'm not sure why the processes gets killed at around 3GB. Feels like you
hit the ulimit. See Stuart's mail about how to look into that.
 
So looking at this output I feel like you somehow created a BGP update
loop where one or more systems are constantly sending UPDATEs to each
other because the moment the update is processed the route decision
changes and flaps back resulting in a withdraw or update.

You can check the 'bgpctl show' and 'bgpctl show nei <neighbor>' output to
see between which peers many messages are sent. From there on you need to
see which prefixes cause this update storm. Probably some filter rule
causes this.

My assumption is that because of this UPDATE loop the systems slowly kill
each other by pushing more and more updates into various buffers along the
way.

> During the crash, bgpctl show rib mem doesn't work.
> Here is the ps aux | grep bgp output during the crash:
> 
> _bgpd    25479  0.0  0.0     0     0 ??  Zp     -          0:00.00 (bgpd)
> _bgpd    46603  0.0  0.0     0     0 ??  Zp     -          0:00.00 (bgpd)
> _bgpd     8696  0.0  0.0     0     0 ??  Zp     -          0:00.00 (bgpd)
> root     76428  0.0  0.0   180   772 ??  R/2     3:08PM    0:00.00 grep bgpd
> 
> 
> Please note /var/log/messages output:
> 
> Mar 30 15:07:27 bgpgw-004 bgpd[17103]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[17103]: main: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: RTR: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[46603]: fatal in RTR: Lost connection to
> parent
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to RDE
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to RDE control
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: peer closed imsg connection
> Mar 30 15:07:27 bgpgw-004 bgpd[8696]: SE: Lost connection to parent

That is just the tail end of bgpd "crashing". The real reason is further
up but is most probably a memory allocation failure.

-- 
:wq Claudio

Reply via email to