On 2/12/22 06:08, David Laight wrote:
From: Raffaello D. Di Napoli
Sent: 12 February 2022 01:33


On 2/11/22 16:22, Rob Landley wrote:
On 2/9/22 11:12 AM, Baruch Siach wrote:
Hi Sun,

On Wed, Feb 09 2022, סאן עמר wrote:
Hi, I'm using busybox for a while now (v1.29.2). and I had an issue with a 
sigterm send randomly
to a process of mine. I debugged it until I found
it from the timeout process which was assigned before to another process with 
the same pid. (i'm
using a lot of timeouts for a lot of jobs)
so i looked at the code, "timeout.c" file where it sleep for 1 second in each 
iteration then check
the timeout status. I suspect at this time the
process timeout monitoring is terminated, but another one with the same pid is 
already created.
which creates unwanted timeout.
There is a comment in there about sleep for "HUGE NUM" will probably result in 
this issue, but I
can't see why it won't occur also in the current
case.

there is no change of this behaviour in the latest master.
i would appreciate any help, sun.
Any reference to PID number is inherently racy.
Not between parent and child.
Except in BB’s timeout, the relationship is not parent/child :)

Much to my surprise, I’ll say that. When I read the bug report the other
day, I thought to myself well, this one ought to be easy to fix. But no,
there’s no SIGCHLD to be handled, no relationship between processes to
be leveraged.

I don’t think this bug can be fixed without a near-complete rewrite, or
without doing a lot of procfs digging to really validate the waited-on
process, since kill(pid, 0) only validates a pid, not a process.
And Linux uses a strict 'next free pid' algorithm for new processes
so the is no guard time between a process exiting and its pid being reused.
This problem was 'fixed' inside the kernel by using a small structure
instead of the pid itself - but that didn't help userspace (or even some 
drivers).
By comparison NetBSD uses the high bits of the pid as a 'generation number'
and so guarantees that a pid won't be reused for some time (a few thousand 
forks).

You can use the process start time (I think it is in /proc/pid/stat)
to validate the process just before the kill().
That leaves a very small timing window that it is hard to avoid
without using pidfd.

        David

Correctly using pidfd *still* requires that you be the parent process, else the child could get reaped and replaced before the pidfd is created.  As far as I can tell, the only purpose of pidfd is for waking on poll() instead of using signals, which is orthagonal to this problem.

I haven't looked at the source in busybox yet, but it boggles my mind that it wouldn't just be a simple fork+alarm+waitpid because that is literally the least code implementation, and race-free.

-Mike C

_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to