On 2019/10/18 16:23, Sacha wrote:
> Le 18/10/2019 à 13:22, Claudio Jeker a écrit :
> > On Fri, Oct 18, 2019 at 12:55:02AM -0700, Sacha wrote:
> >> Dear all,
> >>
> >>  first of all sorry if this bug report is not complete, the issue is on our
> >> production firewalls and each test cut all our AS network, we have to be in
> >> the datacenter to go further.
> > This is not good.
> >
> >>  We have 2 firewalls on master/slave Carp failover, with BGPD and OSPF.
> >>  After upgrading on 6.6, we have an issue when we reboot one of our two
> >> firewalls, it make the other crash the BGPD daemon (our AS is no more
> >> announced).
> >>  This occurs even on master and slave firewall, when we reboot one the 
> >> other
> >> looses it's bgp.
> >>  What we know so far is if we stop ospf & ospf6 daemons before the reboot,
> >> there is no more issue.
> >>  I'm going to the datacenter this afternoon, I will try to reproduce with
> >> more logs.
> >>  All ideas for debugging are welcome.
> >>
> Just back from the datacenter after some tests.
> 
> Let's have some names to make it easier: Firewall 1 usualy the master is
> Cerbere1, Firewall 2 is Cerbere2
> 
> The issue occurs only if we shutdown Cerbere2 (idenpendantly of his
> state of carp master/slave): the bgpd on Cerbere1 shuts down:
> 
> Oct 18 15:16:35 cerbere1 bgpd[74950]: session engine exiting
> Oct 18 15:16:41 cerbere1 bgpd[91574]: kernel routing table 0 (Loc-RIB)
> decoupled
> Oct 18 15:16:42 cerbere1 bgpd[91574]: route decision engine terminated;
> signal 11
> Oct 18 15:16:42 cerbere1 bgpd[91574]: terminating
> 
> We tried to reproduce the issue when shuting down Cerbere2, no problem.
> We will check if all the configurations are the sames.
> 
> The strange thing is when I launch bgpd on Cerbere1 from shell (bgpd -dv
> -c /etc/bgpd.conf) I have no issue (tested twice !).
> 
> > Check /var/log/daemon what did bgpd log before going down?
> > I would be interested to see the bgpd related syslog output.
> >
> > You can increase logging with bgpctl log verbose or just run bgpd
> > in debug more (bgpd -dvv).
> >
> > If one of the process crashes (normally by a SIGSEGV or similar signal)
> > then set the sysctl kern.nosuidcoredump=3 and create a directory called
> > /var/crash/bgpd. Also make sure your limit for the coredumpsize is high
> > enough. This should allow you to get a coredump of the crashing process.
> > Once you have a core it should be possible to get a backtrace.
> >
> Finaly, it's not a process crash it just a clean shutdown, but it is not
> excepected and we don't know why.

"route decision engine terminated; signal 11" is not a clean shutdown,
it is SIGSEGV.

If you followed Claudio's suggestion of setting sysctl kern.nosuidcoredump=3
and creating /var/crash/bgpd, you should have a bgpd.core file.

Reply via email to