Hello Patrick, On 03/18, Patrick Donnelly wrote: > > We are currently trying to debug a problem with ptrace that I believe > was incidentally fixed by you and maybe Tejun Heo in Linux v3.1.
So let me add Tejun and lkml, > The > issue is on github [2] but I will describe it here briefly. My hope is > that you may remember fixing this and a patch may be made for v3.0. > [An HPC center is using Linux v3.0 which exhibits this.] Heh, sorry, I can't recall anything related ;) > The basic problem is that the application we are tracing spawns > threads which **sometimes** are not traced (or lost). For Linux v3.0, > we are using PTRACE_ATTACH and the PTRACE_O_TRACE(CLONE|FORK|VFORK) > options to follow children [3]. The problem we see is that we will > receive a PTRACE_EVENT_CLONE event that a thread is created but we > receive no other events for the thread. IOW, the new thread do not report SIGSTOP injected by implicit attach? > What's worse is that the > thread eventually "comes back" via a PTRACE_EVENT_CLONE when it clones > its own thread. OK, so at least the new child is traced too, and it also has PT_TRACE_* flags copied from its parent. > Do you recall fixing anything for v3.1 that might cause this problem? No. I do not see how the new tracee can miss that SIGSTOP/TIF_SIGPENDING. To clarify, the usage of SIGSTOP in ptrace was always buggy by design. For example, SIGCONT from somewhere can remove the pending (and not yet reported) SIGSTOP, and this _can_ explain the problem you hit. But unless you use PTRACE_SEIZE the same can happen on v3.1 so it seems there is something else. It would be nice to have a test-case :/ Oleg. > [1] http://ccl.cse.nd.edu/software/parrot/ > [2] https://github.com/cooperative-computing-lab/cctools/issues/1207 > [3] > https://github.com/cooperative-computing-lab/cctools/blob/f82288167b1b5abb836b1d9b8135c98f71ed90f6/parrot/src/tracer.c#L91-L128 > > -- > Patrick Donnelly

