On Mon, Mar 21, 2016 at 3:07 PM, Oleg Nesterov <[email protected]> wrote: > On 03/21, Patrick Donnelly wrote: >> >> That seems to be the case but it will only report certain events (not >> syscalls). I have observed PTRACE_EVENT_EXIT and PTRACE_EVENT_CLONE >> events... Hmm, now that I think about this, it would be necessary to >> see the initial SIGSTOP (or PTRACE_EVENT_STOP) in order to initiate >> syscall tracing via PTRACE_SYSCALL. So that does seem to indicate the >> problem. > > Yes, exactly, you need to see the initial SIGSTOP or another event which > can be reported before it.
Assuming a SIGSTOP is being silenced, is there anything we can do to forcibly start tracing syscalls? (For kernels without PTRACE_SEIZE) >> > To clarify, the usage of SIGSTOP in ptrace was always buggy by design. >> > For example, SIGCONT from somewhere can remove the pending (and not yet >> > reported) SIGSTOP, and this _can_ explain the problem you hit. >> >> The tree of processes being traced do no send any signals but an >> external process may have. > > I am looking into > > > https://github.com/cooperative-computing-lab/cctools/blob/5ccb04599ba2ee125730981f53add80d98cf8161/parrot/src/pfs_main.cc > > and this code > > case SIGSTOP: > /* Black magic to get threads working on old Linux kernels... */ > > if(p->nsyscalls == 0) { /* stop before we begin running the process */ > debug(D_DEBUG, "suppressing bootstrap SIGSTOP for %d",pid); > signum = 0; /* suppress delivery */ > kill(p->pid,SIGCONT); > } > break; > > doesn't look right. Note that kill(pid,SIGCONT) affects the whole thread- > group. So if this kill() races with another thread doing clone() you can > hit the problem you described. You're right, that should be tkill! I will give that a try and report back if that solved the issue for our collaborators... >> > But unless you use PTRACE_SEIZE the same can happen on v3.1 so it seems >> > there is something else. >> >> Okay, it might be that PTRACE_SEIZE fixes it. > > Yes, but iiuc you do not see this problem on v3.1 even with PTRACE_ATTACH? I have not tested on >v3.1 with PTRACE_ATTACH. As you know, v3.1 was when the PTRACE_SEIZE code was merged along with many other changes. [I actually thought the merge occurred in 3.4 because of the ptrace man page. I have submitted a bug report to get that fixed.] I have not had any reports of the problem with Linux versions after and including v3.1. Again, I will see if the kill system call was the cause and report back if so. Thanks for taking the time to look at the code! -- Patrick Donnelly

