In the trace I see a sequence that looks like:

[linux.C:167-G] - Stopped with signal 19
[generator.C:209-G] - Got event
[generator.C:144-G] - Setting generator state to decoding
[generator.C:144-G] - Setting generator state to statesync
[generator.C:144-G] - Setting generator state to queueing
[generator.C:144-G] - Setting generator state to none
[generator.C:144-G] - Setting generator state to process_blocked

I've never seen this before, and I'm not sure what happened.  It
almost looks like ProcControlAPI got an event that it couldn't
understand.  I wonder if this is the missing event from the new
thread.  I'd suggest focusing on this and seeing if you can trace what
happened.

A few minutes after I wrote this I realized what the core problem is.
ProcControlAPI keep track of "dead threads" in the ProcPool, and use
this list to suppress events that trickle in from dead multi-threaded
processes (we'd sometimes see Linux feed us queued up debug events from
threads after a process's main thread dies).  As Josh suggested, we're
likely seeing TID reuse and mis-identified the new thread as a lingering
event from a dead thread.

This makes a tremendous amount of sense.

I seem to recall we've discussed the dead thread tracking problem before and not come up with any good way to distinguish a recycled TID from a dead one that legitimately should be suppressed, but this test case suggests one simple and obvious solution: discard non-thread-create events from dead threads, and (obviously) remove threads from the dead list when their TID becomes live again.

Any problems with this approach that you guys see?

-Matt


--
--bw

Bill Williams
Paradyn Project
[email protected]
_______________________________________________
Dyninst-api mailing list
[email protected]
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

Reply via email to