On 2/9/22 11:12 AM, Baruch Siach wrote: > Hi Sun, > > On Wed, Feb 09 2022, סאן עמר wrote: >> Hi, I'm using busybox for a while now (v1.29.2). and I had an issue with a >> sigterm send randomly to a process of mine. I debugged it until I found >> it from the timeout process which was assigned before to another process >> with the same pid. (i'm using a lot of timeouts for a lot of jobs) >> so i looked at the code, "timeout.c" file where it sleep for 1 second in >> each iteration then check the timeout status. I suspect at this time the >> process timeout monitoring is terminated, but another one with the same pid >> is already created. which creates unwanted timeout. >> >> There is a comment in there about sleep for "HUGE NUM" will probably result >> in this issue, but I can't see why it won't occur also in the current >> case. >> >> there is no change of this behaviour in the latest master. >> i would appreciate any help, sun. > > Any reference to PID number is inherently racy.
Not between parent and child. That's why zombies need reaping: a child process will not exit until the parent accepts its exit code and timing data via wait() (or itself exits). You can leave a zombie process lying around for literally years and the PID won't be reused while the zombie still exists. You can set SIGCHLD to SIG_IGN to allow your child processes to asynchronously self-reap, which can make it disappear out from under you. And that signal mask can be inherited by child processes, just like you can leak filehandles and environment variables and tty state and cpu mask and nice level and a dozen other things into your child process state: the parent process defines the child state back to PID 1. If a parent process ignores SIGCHLD and spawns child processes that run timeout, in that case the child process of the timeout could cycle out from under timeout. I'd look for that in this instance. One workaround would would be timeout.c explicitly doing signal(SIGCHLD, SIG_DFL) to correct its inherited signal mask, but the parent process doing that is pilot error, and that "fix" is about like a program using fcntl(SETFL) to strip O_NONBLOCK off stdout just in case it was called with broken standard filehandles: that kind of defensive programming isn't generally part of busybox. > There is no solution for > your problem in the traditional POSIX API. The preponderance of the evidence is that the posix API people have used for ~50 years and sent into space and such can indeed work reliably for people who know how to use it. Next time you want to say "Linux/posix is fundamentally broken, there is no solution in existing deployed kernels, this command busybox has had for 14 years can't ever have worked reliably and nobody noticed before", why not try phrasing it as a question instead of a statement of fact? Rob _______________________________________________ busybox mailing list [email protected] http://lists.busybox.net/mailman/listinfo/busybox
