Can't type acronyms.  SBPAWFR should be SCPAWFR, dunno why I got one letter too 
eager...

Jim


> On Apr 28, 2021, at 7:20 PM, Jim Ingham via lldb-dev 
> <lldb-dev@lists.llvm.org> wrote:
> 
> I have a question about the handling of SendContinuePacketAndWaitForResponse 
> when an attempt to interrupt the process times out.
> 
> One oddity I noticed is in the handling of the eStateInvalid return from 
> SendContinuePacketAndWaitForResponse (in ProcessGDBRemote::AsyncThread).  
> When there's an async interrupt request in flight and the ReadPacket in 
> SBPAWFR wakes up from its periodic timeout and it's been "too long" with 
> nothing from debugserver, SBPAWFR returns eStateInvalid.  The code handling 
> that in the AsyncThread does:
> 
>              case eStateInvalid: {
>                // Check to see if we were trying to attach and if we got back
>                // the "E87" error code from debugserver -- this indicates that
>                // the process is not debuggable.  Return a slightly more
>                // helpful error message about why the attach failed.
>                if (::strstr(continue_cstr, "vAttach") != nullptr &&
>                    response.GetError() == 0x87) {
>                  process->SetExitStatus(-1, "cannot attach to process due to "
>                                             "System Integrity Protection");
>                } else if (::strstr(continue_cstr, "vAttach") != nullptr &&
>                           response.GetStatus().Fail()) {
>                  process->SetExitStatus(-1, response.GetStatus().AsCString());
>                } else {
>                  process->SetExitStatus(-1, "lost connection");
>                }
>                break;
>              }
> 
> So when an interrupt fails, we immediately give up on the process, setting 
> the exit status to -1.  
> 
> Note, however, that this branch of the if doesn't set "done" to true.  The 
> switch is inside a 'while (done)', so after setting the state to 
> eStateInvalid, we go back to wait for another packet from the debug stub.  
> 
> SetExitStatus calls SetPrivateState(eStateExited), which will produce another 
> event, and normally we'll handle that and shut down nicely.  But by returning 
> to fetch another packet from debugserver, we're leaving ourselves open to 
> getting a delayed stop we are no longer set up to deal with.
> 
> The background is that there's a longstanding but infrequent crash in lldb, 
> which  I'm trying to fix.  I've never been able to reproduce it locally, so I 
> only have the end state.  
> 
> The crashing thread is the internal-state thread.  It is crashing because it 
> was handling a stop event, and went to ask the thread what to do, and the 
> ThreadPlanStack is corrupt in some form or other.  The symptoms vary, but 
> they all violate basic invariants in the ThreadPlanStack (like it will crash 
> because it has no elements, which is never true).  At the time of the crash, 
> there's always another thread which is some ways past the "wait for the 
> process to stop" part of the Destroy operation on the process, and has begun 
> tearing down Process objects.  That's pretty clearly why the ThreadPlanStack 
> is in a bad state...
> 
> So far as I can see, the only way that could happen is if the attempt to 
> interrupt timed out, but we didn't stop listening for packets, and didn't 
> immediately shut down the internal state thread.  So then debugserver woke 
> up, and sent the stop packet, and the internal state thread tried to process 
> it, but we were already in mid Destroy, and that went poorly...
> 
> The fact that we go back for another packet after getting eStateInvalid seems 
> to be what allows this to happen.
> 
> lldb has lacked the "done = true;" that I would have expected here since at 
> least the great reformatting, so I don't have any history on this?  
> 
> I added the "done = true;" and ran the testsuite, and there weren't any new 
> failures.  But this code is pretty complex, so there might be some reason for 
> waiting to see if the stub returns another packet after the interrupt fails.  
> But I can't think of one.
> 
> Can anybody else think of a reason why not to set done to true here?
> 
> Jim
> 
> 
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to