On 2021-10-20 07:30, Otto Moerbeek wrote:
On Tue, Oct 19, 2021 at 09:47:22PM +0200, Martijn van Duren wrote:
On Tue, 2021-10-19 at 19:56 +0200, Otto Moerbeek wrote:
> On Tue, Oct 19, 2021 at 07:49:15PM +0200, Mischa wrote:
> > On 2021-10-15 20:05, Otto Moerbeek wrote:
> > > On Fri, Oct 15, 2021 at 07:47:22PM +0200, Mischa wrote:
> > > > On 2021-10-15 19:42, Otto Moerbeek wrote:
> > > > > On Fri, Oct 15, 2021 at 07:16:55PM +0200, Mischa wrote:
> > > > >
> > > > > > On 2021-10-15 18:27, Otto Moerbeek wrote:
> > > > > > >
> > > > > > > The actual problem (SIGSEGV) happens in the child processes: 
ktrace the
> > > > > > > children as well: ktrace -di ...
> > > > > > >
> > > > > > >      -Otto
> > > > > >
> > > > > > Thanx Otto.
> > > > > > Below is the the kdump with ktrace -di
> > > > > > It's quite a lot of data but I didn't want to remove something that
> > > > > > could
> > > > > > potentially be useful.
> > > > > >
> > > > > > Mischa
> > > > > >
> > > > >
> > > > > The pattern below happens multiple times:
> > > > >
> > > > > A recvfrom of 101 bytes and after that a SIGSEGV.
> > > > >
> > > > > Now we do not know for sure if those two lines are related.
> > > > >
> > > > > I suspect that it is no coincidence that the 101 is one larger than
> > > > > 100...
> > > > >
> > > > > No other clue yet.
> > > >
> > > > Anything else I can collect.
> > >
> > > You might want to compile and install nsd wit debug symbols info:
> > >
> > >  cd /usr/src/usr.sbin/nsd
> > >  make -f Makefile.bsd-wrapper obj
> > >  make -f Makefile.bsd-wrapper clean
> > >  DEBUG=-g make -f  Makefile.bsd-wrapper
> > >  make -f  Makefile.bsd-wrapper install
> > >
> > >
> > > Then: collect a gdb trace from a running process: install gdb from
> > > ports,
> > > run
> > >  egdb --pid=pidofnsdchild /usr/sbin/nsd
> > >
> > > and wait for the crash.
> > >
> > > But I'm mostly unfamiliar with the nsd code and what has been changed
> > > recently.  I's say make sure sthen@ and florian@ see this: move to
> > > bugs@ as I do not know if they read misc@.
> >
> > Thanx Otto.
> >
> > As this is my first time using gdb, I need some assistance.
> >
> > root@name2:~ # ps -aux | grep nsd
> > _nsd     79188  0.0  1.0 101704 86400 ??  Ip      7:31PM    0:00.20 nsd:
> > xfrd (nsd)
> > _nsd     24002  0.0  0.4 37188 37388 ??  Ip      7:31PM    0:00.29 nsd: main
> > (nsd)
> > _nsd     44937  0.0  0.2 37544 18308 ??  Sp      7:45PM    0:00.11 nsd:
> > server 1 (nsd)
> >
> > root@name2:~ # egdb --pid=44937 /usr/sbin/nsd
> > GNU gdb (GDB) 7.12.1
> > Copyright (C) 2017 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-unknown-openbsd7.0".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> > <http://www.gnu.org/software/gdb/documentation/>.
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from /usr/sbin/nsd...(no debugging symbols found)...done.
> > Attaching to program: /usr/sbin/nsd, process 44937
> > Reading symbols from /usr/lib/libssl.so.50.0...done.
> > Reading symbols from /usr/lib/libcrypto.so.47.0...done.
> > Reading symbols from /usr/lib/libevent.so.4.1...done.
> > Reading symbols from /usr/lib/libc.so.96.1...done.
> > Reading symbols from /usr/libexec/ld.so...done.
> > [Switching to thread 563101]
> > kevent () at /tmp/-:3
> > 3       /tmp/-: No such file or directory.
> >
> > Anything I am missing?
> >
> > Mischa
> >
>
> Do you see a gdb prompt? If so
>
>   continue
>
> should it (and then wait for the crash).
>
> If you still see the crashes, a tcpdump of the traffic to nsd might
> helps as well, I can replay that locally against nsd. I would also
> need your nsd config for that.
>
>    -Otto
>
I did some debugging with Mischa.

Unfortunately I misclicked and deleted the backtrace. However, the
problem was that query.c calls add_rrset (query.c:736) from
answer_delegation (query.c:917), where rrset is NULL.

When looking in the original query it was always a PTR request to
an IPv6 record. When looking through the file we tried to remove
some likely suspect entries to see if we could pinpoint the root-
cause, but after readding everything it wouldn't crash anymore.

Adding a simple comment to the zonefile of the second NS server
yielded the same result: the server won't crash anymore.

Mischa is going to monitor the situation to see if the issues
return, but my current best guess is that some weird state got
cached somewhere somehow and got flushed when saving the
zonefile.

martijn@


Maybe some form of corruption in the zonefile that was remved when
saving? Who knows.... Anyway, thanks for taking care.

Unfortunately our joy was short lived. This morning I noticed a lot of
Oct 20 07:44:15 name1 nsd[80814]: server 76410 died unexpectedly with status 11, restarting

It looks like there is a potentially fixed in version 4.3.8.

https://github.com/NLnetLabs/nsd/issues/195
https://github.com/NLnetLabs/nsd/issues/189

https://github.com/NLnetLabs/nsd/blob/NSD_4_3_8_REL/doc/ChangeLog
23 August 2021: Wouter
- Fix #189: nsd 4.3.7 crash answer_delegation: Assertion
`query->delegation_rrset' failed.

(Thanx Roger!)

As far as I can tell from the things Martijn found it might be the case.

Will give that a try and report back.

Mischa

Reply via email to