Your logs show segfaults (signal 11) so there should be a way to get core dumps saved from these. You will definitely need the sysctl (and pre-created /var/crash/bgpd directory) because by default cores are disabled for processes which have changed uid/gid (RDE and session engine do this). If the parent (root) process also crashes you should get one in the current directory (from startup time) named bgpd.core but this didn't happen in your log.

If you don't get anywhere with that you can attach to processes with "gdb /usr/sbin/bgpd $pid" (will need each of the processes separately), this will interrupt the process so at the gdb prompt quickly type "c" to continue. Because the segfault happens so soon after startup for you you will probably need to modify bgpd.conf to disable the peer ("passive" might give you enough time) so you can connect gdb, continue, then enable the peer. It's much easier to fix coredumps.


Others have seen what they think might be the same crash but not until much longer after startup, whatever conditions are causing this to trigger so quickly for you might go away as peers change things, so anything you can grab while it's still happening would be quite useful.


It might be useful to get a packet capture too, "tcpdump -s1500 -i $interface -w bgp.pcap port 179 and ip6", send it off list (probably at least to me and Claudio) if you don't want to make it public.


--
Sent from a phone, apologies for poor formatting.

On 19 September 2018 11:53:14 "Aaron A. Glenn" <[email protected]> wrote:

* Claudio Jeker <[email protected]> [2018-09-19 09:15]:
On Tue, Sep 18, 2018 at 07:12:39PM +0000, Aaron A. Glenn wrote:

Sep 18 19:06:10 nairobi bgpd[92056]: startup
Sep 18 19:06:10 nairobi bgpd[92056]: rereading config
Sep 18 19:06:10 nairobi bgpd[97583]: route decision engine ready
Sep 18 19:06:10 nairobi bgpd[99160]: session engine ready
Sep 18 19:06:10 nairobi bgpd[99160]: listening on 0.0.0.0
Sep 18 19:06:10 nairobi bgpd[99160]: listening on ::
Sep 18 19:06:10 nairobi bgpd[99160]: SE reconfigured
Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): state change None -> Idle, reason: None Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): state change Idle -> Connect, reason: Start Sep 18 19:06:10 nairobi bgpd[97583]: change to/from route-collector mode ignored
Sep 18 19:06:10 nairobi bgpd[97583]: RDE reconfigured
Sep 18 19:06:10 nairobi bgpd[97583]: running softreconfig in
Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): state change Connect -> OpenSent, reason: Connection opened
Sep 18 19:06:10 nairobi bgpd[97583]: RDE soft reconfiguration done
Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): state change OpenSent -> OpenConfirm, reason: OPEN message received Sep 18 19:06:10 nairobi bgpd[99160]: neighbor 2xxxx:xxxx::4 (nbo-v6): state change OpenConfirm -> Established, reason: KEEPALIVE message received Sep 18 19:06:10 nairobi bgpd[97583]: neighbor 2xxxx:xxxx::4 (nbo-v6): sending IPv6 unicast EOR marker Sep 18 19:06:12 nairobi bgpd[97583]: neighbor 2xxxx:xxxx::4 (nbo-v6): bad ASPATH, path invalidated and prefix withdrawn
Sep 18 19:06:12 nairobi last message repeated 157 times
Sep 18 19:06:12 nairobi bgpd[92056]: peer closed imsg connection
Sep 18 19:06:12 nairobi bgpd[99160]: peer closed imsg connection
Sep 18 19:06:12 nairobi bgpd[99160]: SE: Lost connection to RDE
Sep 18 19:06:12 nairobi bgpd[99160]: peer closed imsg connection
Sep 18 19:06:12 nairobi bgpd[99160]: SE: Lost connection to RDE control
Sep 18 19:06:12 nairobi bgpd[92056]: main: Lost connection to RDE
Sep 18 19:06:12 nairobi bgpd[92056]: session engine terminated; signal 11
Sep 18 19:06:12 nairobi bgpd[92056]: route decision engine terminated; signal 11
Sep 18 19:06:12 nairobi bgpd[92056]: terminating



nairobi# sysctl kern.version
kern.version=OpenBSD 6.4-beta (GENERIC.MP) #301: Tue Sep 18 08:25:16 MDT 2018
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Can you try a later snapshot? The 9. Sept was right in the middle of n2k18
and it could be that this is just an unlucky snapshot with something that
is already fixed. If that does not help Stuart Henderson's mail has good
instructions.

I initially experienced the behavior with a 9 Sept snapshot, and immediately
re-deployed to the latest snapshot (18 Sept). Same result (bad ASPATH might be
new, but don't quote me on that).

If you can get a backtrace for the RDE process crashing that would be
very helpful for me. One way of doing that is to attach gdb to the running
process or to get a core dump with kern.nosuidcoredump=3 set (and mkdir
/var/crash/bgpd)

I'm unsure how to get a backtrace, short of finding the right breakpoint to
set. Using all of my gdb knowledge, I set `follow-fork-mode child` and `catch
fork` but, that wasn't very fruitful (especially w/o symbols)

I will find time to follow sthens (wonderfully complete) instructions and/or
make another attempt at getting a useful gdb backtrace after EuroBSDcon,
provided the latest snapshot then exhibits similar behavior.

Since it doesn't (seem to) coredump, I was unable to coax anything into
/var/crash/bgpd



Reply via email to