On Wed, Jan 29, 2020 at 02:04:06PM +0100, Martin Pieuchot wrote: > Diff below enables a ptrace(2) regress coming from NetBSD. > > With usr.bin/make built since -D2020-01-14, that includes -current, it > complains during the last test: > > make: Child (52049) not in table? > FAILED > > That results in a failing test, however the syscall correctly reports > EBUSY. > > Should I commit this first to help you look at the issue?
At first I thought forgetting to handle WIFSTOPPED might explain things. But looking more closely, I think the changes in make just made a system bug more apparent. By instrumenting make a bit: Index: job.c =================================================================== RCS file: /cvs/src/usr.bin/make/job.c,v retrieving revision 1.159 diff -u -p -r1.159 job.c --- job.c 16 Jan 2020 16:07:18 -0000 1.159 +++ job.c 29 Jan 2020 13:52:41 -0000 @@ -757,11 +757,15 @@ reap_jobs(void) Job *job; while ((pid = waitpid(WAIT_ANY, &status, WNOHANG)) > 0) { + fprintf(stderr, "Process %ld said %d\n", (long)pid, status); + if (WIFSTOPPED(status) || WIFCONTINUED(status)) + continue; reaped = true; job = reap_finished_job(pid); if (job == NULL) { - Punt("Child (%ld) not in table?", (long)pid); + Punt("Child (%ld) with status %d not in table?", + (long)pid, status); } else { handle_job_status(job, status); determine_job_next_step(job); I see the following pattern: ./t_ptrace -r 6 Mark the parent process (PID 22772) a debugger of PID 93154 Mark the parent process (PID 22772) a debugger of PID 93154 again Process 93154 said 0 Process 93154 said 0 make: Child (93154) with status 0 not in table? so waitpid gives me 93154 with status 0 *twice* (so it reaps the same child twice, as status == 0 corresponds to exit(0) ). I fail to see how I can recover from that (or why I should)...