On Wed, Jan 29, 2020 at 02:04:06PM +0100, Martin Pieuchot wrote:
> Diff below enables a ptrace(2) regress coming from NetBSD.
> 
> With usr.bin/make built since -D2020-01-14, that includes -current, it
> complains during the last test:
> 
>       make: Child (52049) not in table?
>       FAILED
> 
> That results in a failing test, however the syscall correctly reports
> EBUSY.
> 
> Should I commit this first to help you look at the issue?

At first I thought forgetting to handle WIFSTOPPED might explain things.

But looking more closely, I think the changes in make just made a system 
bug more apparent.

By instrumenting make a bit:
Index: job.c
===================================================================
RCS file: /cvs/src/usr.bin/make/job.c,v
retrieving revision 1.159
diff -u -p -r1.159 job.c
--- job.c       16 Jan 2020 16:07:18 -0000      1.159
+++ job.c       29 Jan 2020 13:52:41 -0000
@@ -757,11 +757,15 @@ reap_jobs(void)
        Job *job;
 
        while ((pid = waitpid(WAIT_ANY, &status, WNOHANG)) > 0) {
+               fprintf(stderr, "Process %ld said %d\n", (long)pid, status);
+               if (WIFSTOPPED(status) || WIFCONTINUED(status))
+                       continue;
                reaped = true;
                job = reap_finished_job(pid);
 
                if (job == NULL) {
-                       Punt("Child (%ld) not in table?", (long)pid);
+                       Punt("Child (%ld) with status %d not in table?", 
+                           (long)pid, status);
                } else {
                        handle_job_status(job, status);
                        determine_job_next_step(job);


I see the following pattern:
./t_ptrace -r 6
Mark the parent process (PID 22772) a debugger of PID 93154
Mark the parent process (PID 22772) a debugger of PID 93154 again
Process 93154 said 0
Process 93154 said 0
make: Child (93154) with status 0 not in table?


so waitpid gives me 93154 with status 0 *twice* (so it reaps the same child
twice, as status == 0 corresponds to exit(0) ).

I fail to see how I can recover from that (or why I should)...

Reply via email to