On Tue, Oct 19, 2021 at 09:47:22PM +0200, Martijn van Duren wrote:

> On Tue, 2021-10-19 at 19:56 +0200, Otto Moerbeek wrote:
> > On Tue, Oct 19, 2021 at 07:49:15PM +0200, Mischa wrote:
> > 
> > > On 2021-10-15 20:05, Otto Moerbeek wrote:
> > > > On Fri, Oct 15, 2021 at 07:47:22PM +0200, Mischa wrote:
> > > > > On 2021-10-15 19:42, Otto Moerbeek wrote:
> > > > > > On Fri, Oct 15, 2021 at 07:16:55PM +0200, Mischa wrote:
> > > > > > 
> > > > > > > On 2021-10-15 18:27, Otto Moerbeek wrote:
> > > > > > > > 
> > > > > > > > The actual problem (SIGSEGV) happens in the child processes: 
> > > > > > > > ktrace the
> > > > > > > > children as well: ktrace -di ...
> > > > > > > > 
> > > > > > > >         -Otto
> > > > > > > 
> > > > > > > Thanx Otto.
> > > > > > > Below is the the kdump with ktrace -di
> > > > > > > It's quite a lot of data but I didn't want to remove something 
> > > > > > > that
> > > > > > > could
> > > > > > > potentially be useful.
> > > > > > > 
> > > > > > > Mischa
> > > > > > > 
> > > > > > 
> > > > > > The pattern below happens multiple times:
> > > > > > 
> > > > > > A recvfrom of 101 bytes and after that a SIGSEGV.
> > > > > > 
> > > > > > Now we do not know for sure if those two lines are related.
> > > > > > 
> > > > > > I suspect that it is no coincidence that the 101 is one larger than
> > > > > > 100...
> > > > > > 
> > > > > > No other clue yet.
> > > > > 
> > > > > Anything else I can collect.
> > > > 
> > > > You might want to compile and install nsd wit debug symbols info:
> > > > 
> > > >         cd /usr/src/usr.sbin/nsd
> > > >         make -f Makefile.bsd-wrapper obj
> > > >         make -f Makefile.bsd-wrapper clean
> > > >         DEBUG=-g make -f  Makefile.bsd-wrapper
> > > >         make -f  Makefile.bsd-wrapper install
> > > > 
> > > > 
> > > > Then: collect a gdb trace from a running process: install gdb from
> > > > ports,
> > > > run
> > > >         egdb --pid=pidofnsdchild /usr/sbin/nsd
> > > > 
> > > > and wait for the crash.
> > > > 
> > > > But I'm mostly unfamiliar with the nsd code and what has been changed
> > > > recently.  I's say make sure sthen@ and florian@ see this: move to
> > > > bugs@ as I do not know if they read misc@.
> > > 
> > > Thanx Otto.
> > > 
> > > As this is my first time using gdb, I need some assistance.
> > > 
> > > root@name2:~ # ps -aux | grep nsd
> > > _nsd     79188  0.0  1.0 101704 86400 ??  Ip      7:31PM    0:00.20 nsd:
> > > xfrd (nsd)
> > > _nsd     24002  0.0  0.4 37188 37388 ??  Ip      7:31PM    0:00.29 nsd: 
> > > main
> > > (nsd)
> > > _nsd     44937  0.0  0.2 37544 18308 ??  Sp      7:45PM    0:00.11 nsd:
> > > server 1 (nsd)
> > > 
> > > root@name2:~ # egdb --pid=44937 /usr/sbin/nsd
> > > GNU gdb (GDB) 7.12.1
> > > Copyright (C) 2017 Free Software Foundation, Inc.
> > > License GPLv3+: GNU GPL version 3 or later
> > > <http://gnu.org/licenses/gpl.html>
> > > This is free software: you are free to change and redistribute it.
> > > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > > and "show warranty" for details.
> > > This GDB was configured as "x86_64-unknown-openbsd7.0".
> > > Type "show configuration" for configuration details.
> > > For bug reporting instructions, please see:
> > > <http://www.gnu.org/software/gdb/bugs/>.
> > > Find the GDB manual and other documentation resources online at:
> > > <http://www.gnu.org/software/gdb/documentation/>.
> > > For help, type "help".
> > > Type "apropos word" to search for commands related to "word"...
> > > Reading symbols from /usr/sbin/nsd...(no debugging symbols found)...done.
> > > Attaching to program: /usr/sbin/nsd, process 44937
> > > Reading symbols from /usr/lib/libssl.so.50.0...done.
> > > Reading symbols from /usr/lib/libcrypto.so.47.0...done.
> > > Reading symbols from /usr/lib/libevent.so.4.1...done.
> > > Reading symbols from /usr/lib/libc.so.96.1...done.
> > > Reading symbols from /usr/libexec/ld.so...done.
> > > [Switching to thread 563101]
> > > kevent () at /tmp/-:3
> > > 3       /tmp/-: No such file or directory.
> > > 
> > > Anything I am missing?
> > > 
> > > Mischa
> > > 
> > 
> > Do you see a gdb prompt? If so
> > 
> >   continue
> > 
> > should it (and then wait for the crash).
> > 
> > If you still see the crashes, a tcpdump of the traffic to nsd might
> > helps as well, I can replay that locally against nsd. I would also
> > need your nsd config for that.
> > 
> >     -Otto
> > 
> I did some debugging with Mischa.
> 
> Unfortunately I misclicked and deleted the backtrace. However, the
> problem was that query.c calls add_rrset (query.c:736) from
> answer_delegation (query.c:917), where rrset is NULL.
> 
> When looking in the original query it was always a PTR request to
> an IPv6 record. When looking through the file we tried to remove
> some likely suspect entries to see if we could pinpoint the root-
> cause, but after readding everything it wouldn't crash anymore.
> 
> Adding a simple comment to the zonefile of the second NS server
> yielded the same result: the server won't crash anymore.
> 
> Mischa is going to monitor the situation to see if the issues
> return, but my current best guess is that some weird state got
> cached somewhere somehow and got flushed when saving the
> zonefile.
> 
> martijn@
> 

Maybe some form of corruption in the zonefile that was remved when
saving? Who knows.... Anyway, thanks for taking care.

        -Otto

Reply via email to