Le 18/10/2019 à 18:05, Stuart Henderson a écrit : > On 2019/10/18 16:23, Sacha wrote: >> Le 18/10/2019 à 13:22, Claudio Jeker a écrit : >>> On Fri, Oct 18, 2019 at 12:55:02AM -0700, Sacha wrote: >>>> Dear all, >>>> >>>> first of all sorry if this bug report is not complete, the issue is on our >>>> production firewalls and each test cut all our AS network, we have to be in >>>> the datacenter to go further. >>> This is not good. >>> >>>> We have 2 firewalls on master/slave Carp failover, with BGPD and OSPF. >>>> After upgrading on 6.6, we have an issue when we reboot one of our two >>>> firewalls, it make the other crash the BGPD daemon (our AS is no more >>>> announced). >>>> This occurs even on master and slave firewall, when we reboot one the >>>> other >>>> looses it's bgp. >>>> What we know so far is if we stop ospf & ospf6 daemons before the reboot, >>>> there is no more issue. >>>> I'm going to the datacenter this afternoon, I will try to reproduce with >>>> more logs. >>>> All ideas for debugging are welcome. >>>> >> Just back from the datacenter after some tests. >> >> Let's have some names to make it easier: Firewall 1 usualy the master is >> Cerbere1, Firewall 2 is Cerbere2 >> >> The issue occurs only if we shutdown Cerbere2 (idenpendantly of his >> state of carp master/slave): the bgpd on Cerbere1 shuts down: >> >> Oct 18 15:16:35 cerbere1 bgpd[74950]: session engine exiting >> Oct 18 15:16:41 cerbere1 bgpd[91574]: kernel routing table 0 (Loc-RIB) >> decoupled >> Oct 18 15:16:42 cerbere1 bgpd[91574]: route decision engine terminated; >> signal 11 >> Oct 18 15:16:42 cerbere1 bgpd[91574]: terminating >> >> We tried to reproduce the issue when shuting down Cerbere2, no problem. >> We will check if all the configurations are the sames. >> >> The strange thing is when I launch bgpd on Cerbere1 from shell (bgpd -dv >> -c /etc/bgpd.conf) I have no issue (tested twice !). >> >>> Check /var/log/daemon what did bgpd log before going down? >>> I would be interested to see the bgpd related syslog output. >>> >>> You can increase logging with bgpctl log verbose or just run bgpd >>> in debug more (bgpd -dvv). >>> >>> If one of the process crashes (normally by a SIGSEGV or similar signal) >>> then set the sysctl kern.nosuidcoredump=3 and create a directory called >>> /var/crash/bgpd. Also make sure your limit for the coredumpsize is high >>> enough. This should allow you to get a coredump of the crashing process. >>> Once you have a core it should be possible to get a backtrace. >>> >> Finaly, it's not a process crash it just a clean shutdown, but it is not >> excepected and we don't know why. > "route decision engine terminated; signal 11" is not a clean shutdown, > it is SIGSEGV. > > If you followed Claudio's suggestion of setting sysctl kern.nosuidcoredump=3 > and creating /var/crash/bgpd, you should have a bgpd.core file. > Thanks for the precision Stuart, sure I will follow the Claudio's advices to send a debug log, backing in the datacenter for that in two days more or less.
Sacha.
