On 2019/10/18 16:23, Sacha wrote: > Le 18/10/2019 à 13:22, Claudio Jeker a écrit : > > On Fri, Oct 18, 2019 at 12:55:02AM -0700, Sacha wrote: > >> Dear all, > >> > >> first of all sorry if this bug report is not complete, the issue is on our > >> production firewalls and each test cut all our AS network, we have to be in > >> the datacenter to go further. > > This is not good. > > > >> We have 2 firewalls on master/slave Carp failover, with BGPD and OSPF. > >> After upgrading on 6.6, we have an issue when we reboot one of our two > >> firewalls, it make the other crash the BGPD daemon (our AS is no more > >> announced). > >> This occurs even on master and slave firewall, when we reboot one the > >> other > >> looses it's bgp. > >> What we know so far is if we stop ospf & ospf6 daemons before the reboot, > >> there is no more issue. > >> I'm going to the datacenter this afternoon, I will try to reproduce with > >> more logs. > >> All ideas for debugging are welcome. > >> > Just back from the datacenter after some tests. > > Let's have some names to make it easier: Firewall 1 usualy the master is > Cerbere1, Firewall 2 is Cerbere2 > > The issue occurs only if we shutdown Cerbere2 (idenpendantly of his > state of carp master/slave): the bgpd on Cerbere1 shuts down: > > Oct 18 15:16:35 cerbere1 bgpd[74950]: session engine exiting > Oct 18 15:16:41 cerbere1 bgpd[91574]: kernel routing table 0 (Loc-RIB) > decoupled > Oct 18 15:16:42 cerbere1 bgpd[91574]: route decision engine terminated; > signal 11 > Oct 18 15:16:42 cerbere1 bgpd[91574]: terminating > > We tried to reproduce the issue when shuting down Cerbere2, no problem. > We will check if all the configurations are the sames. > > The strange thing is when I launch bgpd on Cerbere1 from shell (bgpd -dv > -c /etc/bgpd.conf) I have no issue (tested twice !). > > > Check /var/log/daemon what did bgpd log before going down? > > I would be interested to see the bgpd related syslog output. > > > > You can increase logging with bgpctl log verbose or just run bgpd > > in debug more (bgpd -dvv). > > > > If one of the process crashes (normally by a SIGSEGV or similar signal) > > then set the sysctl kern.nosuidcoredump=3 and create a directory called > > /var/crash/bgpd. Also make sure your limit for the coredumpsize is high > > enough. This should allow you to get a coredump of the crashing process. > > Once you have a core it should be possible to get a backtrace. > > > Finaly, it's not a process crash it just a clean shutdown, but it is not > excepected and we don't know why.
"route decision engine terminated; signal 11" is not a clean shutdown, it is SIGSEGV. If you followed Claudio's suggestion of setting sysctl kern.nosuidcoredump=3 and creating /var/crash/bgpd, you should have a bgpd.core file.
