In article <YUCAJHVuBAyGEtpL@gloria>, Dima Veselov <kab...@lich.phys.spbu.ru> wrote: >Greetings, > >I do not know if this is NetBSD-related, but I suffer from FreeRADIUS >instability on NetBSD for a long time and do not know how to debug this. > >Symptoms are: RADIUS server randomly (once a day or once a week) can stop >answering and this is not connected to the actual load. While in that state >it can be killed with -9 only, other signals do nothing, rc.d restart script >just hang. > >I have compiled debug version of it and connected gdb: > >0x000077280da42b8a in _sys___kevent50 () from /usr/lib/libc.so.12 >(gdb) bt >#0 0x000077280da42b8a in _sys___kevent50 () from /usr/lib/libc.so.12 >#1 0x000077280e807879 in __kevent50 () from /usr/lib/libpthread.so.1 >#2 0x00007728106270e1 in fr_event_loop (el=0x7728105bcb20) > at src/lib/event.c:625 >#3 0x00000000004364dd in radius_event_process () at src/main/process.c:6056 >#4 0x00000000004466c3 in main (argc=<optimized out>, argv=<optimized out>) > at src/main/radiusd.c:641 > >gdb always show it is stuck in kevent call. radiusd was started with -txx >meaning no threads were used. > >src/lib/event.c:625 says: > >rcode = kevent(el->kq, NULL, 0, el->events, FR_EV_MAX_FDS, ts_wake); > >It seems kevent call is misused somehow leading to not returning from >this syscall or syscall is blocked. What I can debug further?
Well, it seems that the signals are blocked and this does not have to do with kevent (probably FreeRADIUS does it explicitly). You can use ps -p $pid-of-freeradius -o sigmask,sigcatch,sigignore to see what signals are handled. Now, why kevent is stuck, is a different story. You can use fstat -p $pid-of-freeradius to see what files it has open; perhaps this will provide a clue. christos