On Wed, Jun 17, 2020 at 04:57:54PM +0200, Joerg Jung wrote:

> 
> > On 17. Feb 2020, at 15:16, Martin Pieuchot <[email protected]> wrote:
> > On 17/02/20(Mon) 14:55, Joerg Jung wrote:
> >> 
> >>> On 26. Sep 2019, at 15:02, Stuart Henderson <[email protected]> wrote:
> >>> On 2019/09/26 13:45, Stuart Henderson wrote:
> >>>> On 2019/09/26 11:16, Joerg Jung wrote:
> >>>>> 
> >>>>> 
> >>>>> I run a few busy (~800 req/s) NSD servers which I upgraded 
> >>>>> to 6.5, all stock/default OpenBSD, e.g. I’ve not tweaked any 
> >>>>> sysctl values and nsd.conf matches the default as well, just 
> >>>>> added a few hundred zones.
> >>>>> 
> >>>>> Now, when I increase servers from default 1 to 2 in nsd.conf: 
> >>>>>         server-count: 2
> >>>>> it starts spamming my log with:
> >>>>>         nsd[62723]: sendto 1.2.3.4 failed: Resource temporarily 
> >>>>> unavailable
> >>>>> 
> >>>>> checking the source, server.c seems not to handle EAGAIN 
> >>>>> after sendto() and does not recover or retry, it just increases
> >>>>> txerr statistic count - so answer seems really lost :(
> >>>>> 
> >>>>> I tried higher debug level, as well as increasing socket buffers to: 
> >>>>>         net.inet.udp.recvspace= 65536
> >>>>>         net.inet.udp.sendspace=65636
> >>>>> but both didn’t help and netstat -s -p udp does show 
> >>>>>         0 dropped due to full socket buffers  
> >>>>> anyways. So, I don’t believe this is a socket buffer issue.
> >>>>> 
> >>>>> The same server-count: 2 setting worked fine with 6.3.
> >>>>> 
> >>>>> Any hints, insights, or pointers?
> >>>>> Does anyone else experience the same?
> >>>>> 
> >>>> 
> >>>> Maybe it's worth trying to track down further whether this is due to an
> >>>> NSD change or something else in the OS - cvs up -r OPENBSD_6_3 .. (be 
> >>>> sure
> >>>> to use "make -f Makefile.bsd-wrapper [..]" when building).
> >>>> 
> >>> 
> >>> Or, following a comment from claudio@, try a kernel built with this:
> >> 
> >> FYI, I tried that diff and a few other things but neither did help. 
> 
> FYI, after upgrade to 6.7 the issue is still the same, 
> just with different format for the syslog message not 
> explicitly mentioning sendto() anymore, the new error
> log repeating gazillion of times looks like this:
>     nsd[55919]: sendmmsg [0]=1.2.3.4 count=1 failed: Resource temporarily 
> unavailable
> 
> > Did you ktrace(1) the problem?
> 
> Yes, please see below.
> 
> > How is sendto(2) called, in particular
> > is there any MSG_DONTWAIT or FNONBLOCK set on the file descriptor?  
> 
> From the ktrace dump it seems not sendto() being the culprit, but recvfrom().
> Neither, sendto() nor recvfrom() seem to have any flags set.

I think this is an error in your analysis. It's perfectly normal for
recvform to return EAGAIN.

On the sending side it indicates an issue. 

        -Otto


> 
> > Does
> > that mean the kernel returns EWOULDBLOCK even if the userland said it is
> > fine to block?
> 
> I don’t think so. 
> 
> 
> From my limited understanding it seems the two processes 
> seem to concurrently try to recvfrom() after BOTH received kevent() 
> notification with only one being the “winner” and the other one 
> logging the failing?
> 
> Shouldn’t only one of the two try to call recvfrom()?
> 
> Not sure if this is a bug in OpenBSD or NSD, but
> it worked fine in earlier releases.
> 
> However, this one seems related:
> https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=385 
> <https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=385>
> 
> 
> 
>  52438 nsd      STRU  struct kevent { ident=5, filter=EVFILT_READ, 
> flags=0x1<EV_ADD>, fflags=0<>, data=62
> , udata=0x30562fe4198 }
>  55919 nsd      STRU  struct kevent { ident=5, filter=EVFILT_READ, 
> flags=0x1<EV_ADD>, fflags=0<>, data=62
> , udata=0x30562fe4198 }
>  52438 nsd      RET   kevent 1
>  55919 nsd      RET   kevent 1
>  52438 nsd      CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7ffffbf1a0)
>  55919 nsd      CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7ffffbf1a0)
>  52438 nsd      STRU  struct timespec { 1224.307942259 }
>  55919 nsd      STRU  struct timespec { 1224.307946954 }
>  52438 nsd      RET   clock_gettime 0
>  55919 nsd      RET   clock_gettime 0
>  52438 nsd      CALL  
> recvfrom(5,0x30467ecd000,0x20109,0,0x304e553dc18,0x302669c2aa8)
>  55919 nsd      CALL  
> recvfrom(5,0x30497a9d000,0x20109,0,0x304e553dc18,0x302669c2aa8)
>  52438 nsd      GIO   fd 5 read 46 bytes
>        
> "\aY\0\^P\0\^A\0\0\0\0\0\^A\^Dwpad\^Efoo\^Fbar\0\0\^\\0\^A\0\0)\^P\0\0\0\M^@\0\0\0"
>  55919 nsd      RET   recvfrom -1 errno 35 Resource temporarily unavailable
>  52438 nsd      STRU  struct sockaddr { AF_INET, 1.2.3.4:40276 }
>  55919 nsd      CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7ffffbf1a0)
>  52438 nsd      RET   recvfrom 46/0x2e
>  55919 nsd      STRU  struct timespec { 1224.307967366 }
>  55919 nsd      RET   clock_gettime 0
>  52438 nsd      CALL  gettimeofday(0x7f7ffffbeed0,0)
>  55919 nsd      CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7ffffbf1a0)
>  52438 nsd      STRU  struct timeval { 1592402542<"Jun 17 10:02:22 
> 2020">.287233 }
>  55919 nsd      STRU  struct timespec { 1224.307977172 }
>  52438 nsd      RET   gettimeofday 0
>  55919 nsd      RET   clock_gettime 0
>  52438 nsd      CALL  sendto(5,0x30467ecd000,0x2e,0,0x304e553dc18,0x10)
>  55919 nsd      CALL  kevent(11,0,0,0x30549983000,64,0x7f7ffffbf0f8)
>  52438 nsd      STRU  struct sockaddr { AF_INET, 1.2.3.4:40276 }
>  55919 nsd      STRU  struct timespec { 109.487449000 }
>  52438 nsd      GIO   fd 5 wrote 46 bytes
>        
> "\aY\M^@\^E\0\^A\0\0\0\0\0\^A\^Dwpad\^foo\^Fbar\0\0\^\\0\^A\0\0)\^P\0\0\0\M^@\0\0\0"
>  52438 nsd      RET   sendto 46/0x2e
> 
> 
> 
> 
> >>> Index: syscalls.master
> >>> ===================================================================
> >>> RCS file: /cvs/src/sys/kern/syscalls.master,v
> >>> retrieving revision 1.189
> >>> diff -u -p -r1.189 syscalls.master
> >>> --- syscalls.master       11 Jan 2019 18:46:30 -0000      1.189
> >>> +++ syscalls.master       26 Sep 2019 13:01:46 -0000
> >>> @@ -261,7 +261,7 @@
> >>> 130       OBSOL           oftruncate
> >>> 131       STD             { int sys_flock(int fd, int how); }
> >>> 132       STD             { int sys_mkfifo(const char *path, mode_t 
> >>> mode); }
> >>> -133      STD NOLOCK      { ssize_t sys_sendto(int s, const void *buf, \
> >>> +133      STD             { ssize_t sys_sendto(int s, const void *buf, \
> >>>                       size_t len, int flags, const struct sockaddr *to, \
> >>>                       socklen_t tolen); }
> >>> 134       STD             { int sys_shutdown(int s, int how); }
> >>> 
> >>> 
> >>> Run "make syscalls" in sys/kern before building.
> 

Reply via email to