------------------------------------------------------------------------
*De :* Stuart Henderson <[email protected]>
*Objet :* BGPD crash on 6.6
*Date :* vendredi 18 octobre 2019 à 18:05
*Pour :* Sacha <[email protected]>
*Cc :* Claudio Jeker <[email protected]>, [email protected]

> On 2019/10/18 16:23, Sacha wrote:
>> Le 18/10/2019 à 13:22, Claudio Jeker a écrit :
>>> On Fri, Oct 18, 2019 at 12:55:02AM -0700, Sacha wrote:
>>>> Dear all,
>>>>
>>>>  first of all sorry if this bug report is not complete, the issue is on our
>>>> production firewalls and each test cut all our AS network, we have to be in
>>>> the datacenter to go further.
>>> This is not good.
>>>
>>>>  We have 2 firewalls on master/slave Carp failover, with BGPD and OSPF.
>>>>  After upgrading on 6.6, we have an issue when we reboot one of our two
>>>> firewalls, it make the other crash the BGPD daemon (our AS is no more
>>>> announced).
>>>>  This occurs even on master and slave firewall, when we reboot one the 
>>>> other
>>>> looses it's bgp.
>>>>  What we know so far is if we stop ospf & ospf6 daemons before the reboot,
>>>> there is no more issue.
>>>>  I'm going to the datacenter this afternoon, I will try to reproduce with
>>>> more logs.
>>>>  All ideas for debugging are welcome.
>>>>
>> Just back from the datacenter after some tests.
>>
>> Let's have some names to make it easier: Firewall 1 usualy the master is
>> Cerbere1, Firewall 2 is Cerbere2
>>
>> The issue occurs only if we shutdown Cerbere2 (idenpendantly of his
>> state of carp master/slave): the bgpd on Cerbere1 shuts down:
>>
>> Oct 18 15:16:35 cerbere1 bgpd[74950]: session engine exiting
>> Oct 18 15:16:41 cerbere1 bgpd[91574]: kernel routing table 0 (Loc-RIB)
>> decoupled
>> Oct 18 15:16:42 cerbere1 bgpd[91574]: route decision engine terminated;
>> signal 11
>> Oct 18 15:16:42 cerbere1 bgpd[91574]: terminating
>>
>> We tried to reproduce the issue when shuting down Cerbere2, no problem.
>> We will check if all the configurations are the sames.
>>
>> The strange thing is when I launch bgpd on Cerbere1 from shell (bgpd -dv
>> -c /etc/bgpd.conf) I have no issue (tested twice !).
>>
>>> Check /var/log/daemon what did bgpd log before going down?
>>> I would be interested to see the bgpd related syslog output.
>>>
>>> You can increase logging with bgpctl log verbose or just run bgpd
>>> in debug more (bgpd -dvv).
>>>
>>> If one of the process crashes (normally by a SIGSEGV or similar signal)
>>> then set the sysctl kern.nosuidcoredump=3 and create a directory called
>>> /var/crash/bgpd. Also make sure your limit for the coredumpsize is high
>>> enough. This should allow you to get a coredump of the crashing process.
>>> Once you have a core it should be possible to get a backtrace.
>>>
>> Finaly, it's not a process crash it just a clean shutdown, but it is not
>> excepected and we don't know why.
> "route decision engine terminated; signal 11" is not a clean shutdown,
> it is SIGSEGV.
>
> If you followed Claudio's suggestion of setting sysctl kern.nosuidcoredump=3
> and creating /var/crash/bgpd, you should have a bgpd.core file.
>
Hi ,

Good news: my last try to reproduce the issue didn't crash BGPD.

We have cleaned our bgpd.conf and it fixed the issue.

Sacha.


PS: Long life to OpenBGPD !!! I'm really happy than Claudio works on it
in a full time job, this is a great improvement for the community.

Reply via email to