Craig Campbell wrote: > I was hoping to build a version that could fork children, but not spawn > threads.
The server can "exec" child shell scripts. It *cannot* run multiple RADIUS servers as child processes. > There are known 'challenges' in using the fork command in multi threaded > environments. (As opposed to a process that forks children for > different processing branches.) A couple of years ago I had an > extremely challenging time modifying an existing threaded application to > additionally fork off children to perform certain other tasks. The challenge is in ensuring that the right thread catches the right child exit. If you run the server with "radiusd -s", it won't spawn threads. > The issue I am seeing of stranded/hung children looks similar (that is > not to say I have caught the culprit... just suspicion at this point). > The issue seems to happen only sometimes during bursts of increased > load. (Same as my previous experience.) It may be a race condition under heavy load. But I don't see why... the thread that forks then waits for the child to exit, and grabs the exit code. This should ensure that the child dies, rather than staying as a zombie. > If I were to GUESS, at this point I'd look for interrupts that result in > children when mute locks are in place and unintentionally inherited by > the child process. Except that the server doesn't fork... and continue running. It forks, and immediately exec's the shell script. If the shell script fails to be executed, the child *still* dies. The child doesn't obtain *or* check mutexes in between the fork() and exec(). It does almost *nothing*, as there is only a 100 lines of code between the fork() and exec() > (My solution was to acquire ALL locks before a fork, > then have the child and parent clear them all after) - see man > pthread_atfork section: RATIONALE if you have access to a Linux system). That is for long-running children. We don't do that. > I cannot explain why apparently no one else is seeing the issue I am > chasing. As far as I can tell, my configuration is quite basic. Kernel bugs? Possible race conditions in the code? > I am now trying a run with the -s option but, if successful, it won't > tell us much about why. If it works... Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

