Re: bgpd dying with IPv6 neighbor(s)

Stuart Henderson Wed, 19 Sep 2018 04:44:23 -0700

Your logs show segfaults (signal 11) so there should be a way to get coredumps saved from these. You will definitely need the sysctl (andpre-created /var/crash/bgpd directory) because by default cores aredisabled for processes which have changed uid/gid (RDE and session enginedo this). If the parent (root) process also crashes you should get one inthe current directory (from startup time) named bgpd.core but this didn'thappen in your log.

If you don't get anywhere with that you can attach to processes with "gdb/usr/sbin/bgpd $pid" (will need each of the processes separately), thiswill interrupt the process so at the gdb prompt quickly type "c" tocontinue. Because the segfault happens so soon after startup for you youwill probably need to modify bgpd.conf to disable the peer ("passive" mightgive you enough time) so you can connect gdb, continue, then enable thepeer. It's much easier to fix coredumps.

Others have seen what they think might be the same crash but not until muchlonger after startup, whatever conditions are causing this to trigger soquickly for you might go away as peers change things, so anything you cangrab while it's still happening would be quite useful.

It might be useful to get a packet capture too, "tcpdump -s1500 -i$interface -w bgp.pcap port 179 and ip6", send it off list (probably atleast to me and Claudio) if you don't want to make it public.



--
Sent from a phone, apologies for poor formatting.

On 19 September 2018 11:53:14 "Aaron A. Glenn" <[email protected]> wrote:

* Claudio Jeker <[email protected]> [2018-09-19 09:15]:

On Tue, Sep 18, 2018 at 07:12:39PM +0000, Aaron A. Glenn wrote:
Sep 18 19:06:10 nairobi bgpd[92056]: startup
Sep 18 19:06:10 nairobi bgpd[92056]: rereading config
Sep 18 19:06:10 nairobi bgpd[97583]: route decision engine ready
Sep 18 19:06:10 nairobi bgpd[99160]: session engine ready
Sep 18 19:06:10 nairobi bgpd[99160]: listening on 0.0.0.0
Sep 18 19:06:10 nairobi bgpd[99160]: listening on ::
Sep 18 19:06:10 nairobi bgpd[99160]: SE reconfigured
Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): statechange None -> Idle, reason: NoneSep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): statechange Idle -> Connect, reason: StartSep 18 19:06:10 nairobi bgpd[97583]: change to/from route-collector modeignored
Sep 18 19:06:10 nairobi bgpd[97583]: RDE reconfigured
Sep 18 19:06:10 nairobi bgpd[97583]: running softreconfig in
Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): statechange Connect -> OpenSent, reason: Connection opened
Sep 18 19:06:10 nairobi bgpd[97583]: RDE soft reconfiguration done
Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): statechange OpenSent -> OpenConfirm, reason: OPEN message receivedSep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): statechange OpenConfirm -> Established, reason: KEEPALIVE message receivedSep 18 19:06:10 nairobi bgpd[97583]: neighbor 2xxxx:xxxx::4 (nbo-v6):sending IPv6 unicast EOR markerSep 18 19:06:12 nairobi bgpd[97583]: neighbor 2xxxx:xxxx::4 (nbo-v6): badASPATH, path invalidated and prefix withdrawn
Sep 18 19:06:12 nairobi last message repeated 157 times
Sep 18 19:06:12 nairobi bgpd[92056]: peer closed imsg connection
Sep 18 19:06:12 nairobi bgpd[99160]: peer closed imsg connection
Sep 18 19:06:12 nairobi bgpd[99160]: SE: Lost connection to RDE
Sep 18 19:06:12 nairobi bgpd[99160]: peer closed imsg connection
Sep 18 19:06:12 nairobi bgpd[99160]: SE: Lost connection to RDE control
Sep 18 19:06:12 nairobi bgpd[92056]: main: Lost connection to RDE
Sep 18 19:06:12 nairobi bgpd[92056]: session engine terminated; signal 11
Sep 18 19:06:12 nairobi bgpd[92056]: route decision engine terminated;signal 11
Sep 18 19:06:12 nairobi bgpd[92056]: terminating



nairobi# sysctl kern.version
kern.version=OpenBSD 6.4-beta (GENERIC.MP) #301: Tue Sep 18 08:25:16 MDT 2018
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
Can you try a later snapshot? The 9. Sept was right in the middle of n2k18
and it could be that this is just an unlucky snapshot with something that
is already fixed. If that does not help Stuart Henderson's mail has good
instructions.


I initially experienced the behavior with a 9 Sept snapshot, and immediately
re-deployed to the latest snapshot (18 Sept). Same result (bad ASPATH might be
new, but don't quote me on that).

If you can get a backtrace for the RDE process crashing that would be
very helpful for me. One way of doing that is to attach gdb to the running
process or to get a core dump with kern.nosuidcoredump=3 set (and mkdir
/var/crash/bgpd)


I'm unsure how to get a backtrace, short of finding the right breakpoint to
set. Using all of my gdb knowledge, I set `follow-fork-mode child` and `catch
fork` but, that wasn't very fruitful (especially w/o symbols)

I will find time to follow sthens (wonderfully complete) instructions and/or
make another attempt at getting a useful gdb backtrace after EuroBSDcon,
provided the latest snapshot then exhibits similar behavior.

Since it doesn't (seem to) coredump, I was unable to coax anything into
/var/crash/bgpd

Re: bgpd dying with IPv6 neighbor(s)

Reply via email to