Robert, On 9/29/21 11:30, Robert Elz wrote:
| So we caught where the queue is closed, and traced it back to | getaddrinfo(). That call both closes fd#3, creates a new kqueue | and leaves it open. This is the back trace from close: | | #0 0x0000732d69c07892 in close () from /usr/lib/libpthread.so.1 From this I'm guessing that freeradius is multi-threaded ?
It's capable of it, and is linked with -lpthread, but we are running it in a single-threaded mode.
| The full stack traces and ktraces can be found here: | https://github.com/FreeRADIUS/freeradius-server/issues/4244 I saw some helpful data there, but hardly a full ktrace.
Do you think the full ktrace will be helpful? I can sanitize it and share it if needed.
| Our next step is to recompile libc with debugging symbols and start | poking around there to see why is it closing an fd that doesn't | belong to it, but if somebody knows why that might happen - | that'd be great. Is it possible that something at startup is closing fds, but that might be happening after the DNS resolver has been initialised ?
There are no explicit close() calls after fork, at least not on FD#3
As you saw the libc address lookup routines leave the fd open, and if something as part of a "make sure all fd's > 2 are closed at startup" type functionality went and closed it, that would cause a problem. The kqueue fd is used to monitor /etc/resolv.conf for any changes that would require (or might require) it to be re-read (which is a useful thing to do for long running daemons) - so it needs to remain open for the life of the process.
I think my other email provides a reasonable explanation of what is going on. It seems to me that the sequence of getaddrinfo() - fork() - getaddrinfo() will always cause issues since kqueue FD is not inherited after fork(), but the libc address lookup routines keep the descriptor around. [skipped]