I think you should look at git blame on relevant lines in the kernel and ask for help from those people and/or loop them onto this thread. On Mar 19, 2015 8:53 AM, "Pavel Labath" <lab...@google.com> wrote:
> After a considerate amount of effort, I have managed to get a log trace of > a rare failure in TestConcurrentEvents > (test_two_watchpoints_one_signal_dwarf). Unfortunately, I do not see a > clear way how to avoid it, so I'm writing here in case anyone has an idea. > > Here is my interpretation of the events, backed by log snippets (full log > in the attachment, I also can provide the corresponding core file if anyone > wants to dig around). > > The inferior starts out by creating three threads. Of those, two are set > up to hit a watchpoint and one signals itself. > > 411 1426700114.595431000 [7295/729c]: MonitorChildProcessThreadFunction > ::waitpid (pid = -29339, &status, options = 1073741824) => pid = 29346, > status = 0x00000a7f (STOPPED), signal = 10, exit_state = 0 > > On line 411 we see llgs catching the USR1(10) signal. > > 412 1426700114.595450000 [7295/729c]: NativeProcessLinux::MonitorSignal() > received signal SIGUSR1 (10) with code SI_TKILL, (siginfo pid = 29339 (not > from llgs), waitpid pid = 29346) > > Here, we see that the ptrace(GETSIGINFO) call was successful, we get info > about the signal and continue on our way, stop all other threads (which hit > a watchpoint by the way), report this to lldb, etc. After some back and > forth, lldb tells us to resume the inferior and let it process the SIGUSR1 > signal. > > 548 1426700116.644772000 [7295/729a]: NativeProcessLinux::Resume() > resuming thread = 29346 with signal SIGUSR1 > > However, here things get strange. Immediately after that we get a second > waitpid notification, > > 555 1426700116.644896000 [7295/729c]: MonitorChildProcessThreadFunction > ::waitpid (pid = -29339, &status, options = 1073741824) => pid = 29346, > status = 0x00000a7f (STOPPED), signal = 10, exit_state = 0 > > This shouldn't happen as we have just reinjected the signal for the > process to handle, and no other signal was sent to the thread in the > meantime. But we don't know that so we try to process the signal as usual, > by calling ptrace(GETSIGINFO). > > 558 1426700116.644932000 [7295/729c]: ThreadStateCoordinator::EnqueueEvent > enqueued event: EventThreadDeath (tid=29346) > 561 1426700116.644961000 [7295/729c]: NativeProcessLinux::MonitorCallback > GetSignalInfo failed: No such process, tid = 29346, signal = 10, status = 0 > (thread/process killed, is not main thread, thread metadata removed) > > These lines tell us that we are now in the branch where ptrace(GETSIGINFO) > failed with ESRCH. This can happen for two reasons: > - the thread in question has died without us noticing > - the thread is running > Since we have just received a waitpid notification, we know that the > thread should not be running and we assume that the thread has died between > waitpid and getsiginfo calls. We clean up our internal structures and > forget about the thread. > > However, the thread is very much alive and kicking and we get another > notification from the thread on line 567. Now we try to find the thread in > our data structures and assert because it does not exist. > > > These events would seem to indicate that the USR1 notification on line 555 > was bogus - there should be no additional signal, and even if there was, it > should have stopped the thread, but it didn't do that. > > So it seems that we should just ignore bogus signals (= assume they don't > stop the thread and they don't kill it). However, it is not easy to > differentiate between the three scenarios: > - real signal - waitpid returns, getsiginfo suceeds - thread is stopped > - bogus signal - waitpid returns, getsiginfo fails - thread is running > - thread death - waitpid returns, getsiginfo fails - thread is dead > > > Does anyone have an idea how to address this? I have a few, but I don't > like any of them very much, so I'd rather keep them to myself for now. > Besides, this mail has gotten too long already. If you're still reading > this, give yourself a cookie. :) > > pl > > > _______________________________________________ > lldb-dev mailing list > lldb-dev@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev > >
_______________________________________________ lldb-dev mailing list lldb-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev