Hey, first of all thanks for your answers and sorry for how I phrase my
question.
Rob, I still need a clarification if you don't mind. anything that you
mention is making sense to me if the timeout implementation here was that
the actual command is a child process of a process that doing the timeout
logic.
which is the case for other implementations I saw (for example here
<http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/timeout.c;h=98c064a18d54e64e151d8340b8151f9b00240885;hb=HEAD>
)
but for the case of busybox implementation, the timeout is a child process
of the command it runs on (actually a grandchild), so the case of process
that represents the command finish and its parent releasing its PID (via
wait) can occur, while the process that represents the timeout missing it.
and all this can happen while there is no case of ignoring the SIGCHLD.
am I wrong?


‫בתאריך יום ו׳, 11 בפבר׳ 2022 ב-23:19 מאת ‪Rob Landley‬‏ <‪[email protected]
‬‏>:‬

> On 2/9/22 11:12 AM, Baruch Siach wrote:
> > Hi Sun,
> >
> > On Wed, Feb 09 2022, סאן עמר wrote:
> >> Hi, I'm using busybox for a while now (v1.29.2). and I had an issue
> with a sigterm send randomly to a process of mine. I debugged it until I
> found
> >> it from the timeout process which was assigned before to another
> process with the same pid. (i'm using a lot of timeouts for a lot of jobs)
> >> so i looked at the code, "timeout.c" file where it sleep for 1 second
> in each iteration then check the timeout status. I suspect at this time the
> >> process timeout monitoring is terminated, but another one with the same
> pid is already created. which creates unwanted timeout.
> >>
> >> There is a comment in there about sleep for "HUGE NUM" will probably
> result in this issue, but I can't see why it won't occur also in the current
> >> case.
> >>
> >> there is no change of this behaviour in the latest master.
> >> i would appreciate any help, sun.
> >
> > Any reference to PID number is inherently racy.
>
> Not between parent and child. That's why zombies need reaping: a child
> process
> will not exit until the parent accepts its exit code and timing data via
> wait()
> (or itself exits). You can leave a zombie process lying around for
> literally
> years and the PID won't be reused while the zombie still exists.
>
> You can set SIGCHLD to SIG_IGN to allow your child processes to
> asynchronously
> self-reap, which can make it disappear out from under you. And that signal
> mask
> can be inherited by child processes, just like you can leak filehandles and
> environment variables and tty state and cpu mask and nice level and a dozen
> other things into your child process state: the parent process defines the
> child
> state back to PID 1.
>
> If a parent process ignores SIGCHLD and spawns child processes that run
> timeout,
> in that case the child process of the timeout could cycle out from under
> timeout. I'd look for that in this instance.
>
> One workaround would would be timeout.c explicitly doing signal(SIGCHLD,
> SIG_DFL) to correct its inherited signal mask, but the parent process
> doing that
> is pilot error, and that "fix" is about like a program using fcntl(SETFL)
> to
> strip O_NONBLOCK off stdout just in case it was called with broken standard
> filehandles: that kind of defensive programming isn't generally part of
> busybox.
>
> > There is no solution for
> > your problem in the traditional POSIX API.
>
> The preponderance of the evidence is that the posix API people have used
> for ~50
> years and sent into space and such can indeed work reliably for people who
> know
> how to use it.
>
> Next time you want to say "Linux/posix is fundamentally broken, there is no
> solution in existing deployed kernels, this command busybox has had for 14
> years
> can't ever have worked reliably and nobody noticed before", why not try
> phrasing
> it as a question instead of a statement of fact?
>
> Rob
>
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to