Re: [lldb-dev] More linux process control and IOHandler races

jingham Wed, 13 Aug 2014 10:08:44 -0700

> On Aug 13, 2014, at 1:49 AM, Matthew Gardiner <m...@csr.com> wrote:
> 
> Hi Greg,
> 
> Yes, it's clear that a planned attack is required for this. I understand the 
> need to have public and private state e.g. for source-level steps, and indeed 
> for launching a process. That's why I was a bit surprised to discover, after 
> debugging it, that the very first stop on "process launch" resulted in 
> ShouldBroadcastEvent returning true, as I thought that this would be an 
> instance where we'd like to hide this event from the user. I did spend 
> sometime trying to figure out how "ShouldStop" and "ShouldReportStop" worked, 
> but quickly found myself in the world of thread plans, a world in which I 
> know very little. So I'd certainly appreciate you guys, who have that 
> knowledge, thrashing this design out.
> 
> Having these (proposed) different ProcessState subclasses sounds to be a far 
> better solution than "broadcast hijacking + tricks". My only minor critique 
> of the design you floated being that you mentioned a "stack" of these 
> subclass instances, although when we receive an event (i.e. waitpid returns a 
> state change), it seems apparent that you'll then pass this event to (all) 
> the private and the public subclasses. (I tend to think of a stack as being a 
> system where you'd send the event to just the topmost element). Is there any 
> reason to have more just a private handler and a public handler?



I don't think Greg's description of the solution to handling driving the 
process was more than off the cuff comments.  I wouldn't parse it too closely.

Jim


> 
> Regarding simple, interim fixes, I did come up with something (which 
> addresses the observation associated with this thread's first post), which is 
> a little icky. I'll attach it though, to see if it attracts any comment. 
> Essentially, I have a flag which allows first stop to be identified, and 
> arranges that ShouldBroadcastEvent returns false for this stop. Also don't 
> push the IOHandler for launching state (I think it gets pushed later for the 
> running state), and force public state change after the first stop, which 
> seemed weird, but was required, due to the implementation of ProcessLaunch 
> waiting for the initial trap.
> 
> Anyway, thanks for you and Jim's interest in getting a good implementation 
> for this out there.
> 
> Matt
> 
> 
> Greg Clayton wrote:
>> Jim and I have discussed the right thing to do which we plan to implement as 
>> soon as we get the chance.
>> 
>> Current we have two process states: public and private. The private state is 
>> always up to date and is used to track what the process is currently doing. 
>> The public state gets set when we believe the end user should see an event 
>> that will update the GUI or TTY with the current state if needed (stopped or 
>> exited). There is a lot of tricky code that uses the private process state 
>> to manage things like the thread plans. Things like source level single step 
>> might involve starting and stopping and hitting breakpoints and single 
>> instruction stepping, but the user only wants to see the final eStateStopped 
>> when the source level step is done. Expressions are also tricky, the public 
>> state of the process is stopped, but when we run an expression, we might 
>> resume the process many times and the user will always see that the public 
>> state if stopped. Internally though the thread plans are doing a whole bunch 
>> of stuff to the process and the user never hears anything about these (no 
>> events!
  go public).
>> 
>> So the current public and private stuff uses broadcaster hijacking and all 
>> sorts of other tricks to avoid letting the end user know about the starts 
>> and stops that they shouldn't know about. We eventually want to have a new 
>> class, lets call in lldb_private::ProcessState for now, that manages the 
>> current process state and always receives the process events. There would be 
>> no more public and private events. We would have one version of the 
>> ProcessState subclass that would cause the GUI/TTY to update (replacing the 
>> public events) and one ProcessState subclass that would manage running 
>> thread plans. There would be stack of these and this will help us avoid the 
>> whole public/private/hijacking stuff we have now.
>> 
>> Launching a process would end up pushing a new ProcessState class that could 
>> handle all the necessary starts and stop we need to get the process to a 
>> quiescent state that is presentable to the user. The ProcessState subclass 
>> could eat all the events it needs to while launching and then propagate the 
>> stopped or running event when it gets popped off the stack.
>> 
>> So we should make some simple fixes for now to work around the issue and get 
>> things working, but it would also be great to get the new system up and 
>> running so we can make this easier in the future for people that want to 
>> make changes.
>> 
>> Greg
>> 
>> 
>>> On Aug 8, 2014, at 1:08 AM, Matthew Gardiner <m...@csr.com> wrote:
>>> 
>>> Folks,
>>> 
>>> Regarding "launching using the shell",  I don't think applies in the buggy 
>>> case that myself and Shawn are looking at.
>>> 
>>> I do:
>>> 
>>> (lldb) target create ./test
>>> ...
>>> (lldb) process launch
>>> ...
>>> 
>>> and when I inspected the call to exec, I see that my exe name is the 
>>> program passed.
>>> 
>>> Thanks for the insight into the broadcasting of stop events. That explains 
>>> why I see in the ShouldBroadcastEvent, the ShouldStop and ShouldReportStop 
>>> calls.
>>> 
>>> However, it would be nice to know if the first stop event should be 
>>> broadcast for "process launch".  I think it's an implementation detail, and 
>>> therefore should not. That would help to fix this issue.
>>> 
>>> However, Shawn's original suggestion to fix this issue circumvents the 
>>> above debate, by replacing the call of HandlePrivateEvent with 
>>> SetPublicState. So which fix is best? Calling SetPublicState rather than 
>>> HandlePrivateEvent is certainly more expedient, and avoids this debate... 
>>> but is it more portable/future-proof?
>>> 
>>> Matt
>>> 
>>> 
>>> jing...@apple.com wrote:
>>>> If you are launching using the shell, you'll see more stops before you get 
>>>> to the executable you are actually trying to launch.  In that case, 
>>>> instead of just running the binary directly you effectively do:
>>>> 
>>>> /bin/bash exec <binary> arg1 arg2
>>>> 
>>>> so that you can get bash (or whatever is set in $SHELL) to do the argument 
>>>> expansion for you.  GetResumeCountForLaunchInfo calculates this, then it 
>>>> is stuffed into the ProcessLaunchInfo (SetResumeCount).  On Mac OS X we 
>>>> always let the Platform launch, then attach, so in that case the 
>>>> AttachCompletionHandler does the extra resumes.  I'm not all that familiar 
>>>> with how the Linux side work, but it also seems to use the 
>>>> ProcessLaunchInfo's resume count.
>>>> 
>>>> Note that in general in lldb not all publicly broadcast stop messages are 
>>>> going to result in a stop.  For instance, all the breakpoint command & 
>>>> condition handling goes on as a result of the broadcast of the public stop 
>>>> event, but the process might just turn around an continue based on that.  
>>>> Whether to suppress the broadcast and continue from the private state 
>>>> thread or broadcast the event and let the upper levels of lldb take care 
>>>> of what happens from there on really depends on where it makes sense to 
>>>> handle the stop.  So for the case of breakpoint stops (or stop hooks, 
>>>> another example), those end up being equivalent to user typed commands, 
>>>> just done for the user automatically by the system.  So having them happen 
>>>> in a world where the public state wasn't sync'ed up to the private state 
>>>> ended up being very awkward.
>>>> 
>>>> I haven't looked at the launching code in detail recently so I am not sure 
>>>> whether it makes sense for the first stop to be handled as it is.
>>>> 
>>>> Jim
>>>> 
>>>> 
>>>>> On Aug 7, 2014, at 6:29 AM, Matthew Gardiner <m...@csr.com> wrote:
>>>>> 
>>>>> Hi Shawn,
>>>>> 
>>>>> I spent some time today looking at how to arrange for ShouldBroadcast to 
>>>>> return false for this first stop. I managed to produce a quick hack for 
>>>>> this (i.e. just counted the number of stops), but due to other 
>>>>> distractions (from the rest of my job) I didn't get that far into 
>>>>> discovering a nice way of achieving this...
>>>>> 
>>>>> What I did discover is that with my build just doing "process launch" 
>>>>> results in 3 stops (and 3 private resumes). That in itself I find 
>>>>> surprising, since I was under the impression that I should see the 
>>>>> inferior stop just once when exec is trapped by PTRACE_ME.
>>>>> 
>>>>> I then discovered that for each of these 3 stops ShouldBroadcast calls 
>>>>> Thread::ShouldStop, the Thread::ShouldStop returns true for the first 
>>>>> stop and false for the other 2. Looking into these behaviour differences 
>>>>> I then found that from within Thread::ShouldStop we then call into the 
>>>>> following:
>>>>> 
>>>>>    StopInfoSP private_stop_info (GetPrivateStopInfo());
>>>>>    if (private_stop_info && 
>>>>> private_stop_info->ShouldStopSynchronous(event_ptr) == false)
>>>>>    {
>>>>> 
>>>>> and also
>>>>> 
>>>>> bool over_ride_stop = current_plan->ShouldAutoContinue(event_ptr);
>>>>> 
>>>>> the results from either of these, it seems providing the reasoning behind 
>>>>> the different true/false returns. I'll return to spend a bit more time on 
>>>>> this tomorrow. Let me know if you get any further on a similar vein!
>>>>> 
>>>>> thanks
>>>>> Matt
>>>>> 
>>>>> 
>>>>> 
>>>>> Shawn Best wrote:
>>>>>> Matt,
>>>>>> 
>>>>>> I think you are probably right, although there are other places where it 
>>>>>> directly calls SetPublicState().   I was wondering about the possibility 
>>>>>> there could be other listeners waiting for a broadcast public Stop 
>>>>>> event.  Is that a possibility?
>>>>>> 
>>>>>> Some others here were investigating some unit tests that were failing 
>>>>>> intermittently (StopHook).  Their description of the problem sounds 
>>>>>> unrelated to the launch code, but this patch also magically fixes that.
>>>>>> 
>>>>>> Shawn.
>>>>>> 
>>>>>> On 8/6/2014 6:26 AM, Matthew Gardiner wrote:
>>>>>>> Shawn,
>>>>>>> 
>>>>>>> Like I said earlier your patch worked. However I think the right fix is 
>>>>>>> to arrange that ShouldBroadcast returns false for this first stop. I 
>>>>>>> believe this, because firstly no stops should be reported here since 
>>>>>>> the user is only interested in launching a program, and additionally 
>>>>>>> because it enables us to fix lldb without removing the call to 
>>>>>>> HandlePrivateEvent. This, I think, is important to preserve as the 
>>>>>>> central point for process state change handling.
>>>>>>> 
>>>>>>> Matt
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Shawn Best wrote:
>>>>>>>> Hi Matthew,
>>>>>>>> 
>>>>>>>> I have also been tracking this bug.  I believe there are other bugs in 
>>>>>>>> the unit tests failing indirectly because of this.  I also have a 
>>>>>>>> patch that will fix it, but was sitting on it until the other one 
>>>>>>>> landed.  These bugs do not show up on OSX since the inferiors are 
>>>>>>>> launched separately then attached to.
>>>>>>>> 
>>>>>>>> The first odd thing the launching code does is push an IOHandler when 
>>>>>>>> it sees the state transition to 'launching'. This is odd because I 
>>>>>>>> believe the launching program will always come up in a stopped state 
>>>>>>>> which will immediately pop the IOHandler.
>>>>>>>> 
>>>>>>>> At launch, the process comes up in the stopped state.  The launch code 
>>>>>>>> manually calls HandlePrivateEvent() with the stop event, which then 
>>>>>>>> broadcasts the Event.  When HandleProcessEvent gets the public stop, 
>>>>>>>> it dumps out the current thread state just as if an executing inferior 
>>>>>>>> hit a breakpoint and stopped.
>>>>>>>> 
>>>>>>>> One way to fix this would be:
>>>>>>>> 
>>>>>>>> 1. Don't push io handler when state is 'launching'
>>>>>>>> 2. Instead of manually calling HandlePrivateEvent, call 
>>>>>>>> SetPublicState().
>>>>>>>> 
>>>>>>>> Alternately, we could try and debug why ShouldBroadcast() returns 
>>>>>>>> true, but that appears to be by design since it is expecting the 
>>>>>>>> public stop event to pop the IOHandler that had been pushed when 
>>>>>>>> launching.
>>>>>>>> 
>>>>>>>> I have attached a patch demonstrating this.  In conjunction with the 
>>>>>>>> other patch for IOHandler race condition, it will fix a bunch of this 
>>>>>>>> kind of behaviour.
>>>>>>>> 
>>>>>>>> Shawn.
>>>>>>>> 
>>>>>>>> On 8/5/2014 6:59 AM, Matthew Gardiner wrote:
>>>>>>>>> Jim,
>>>>>>>>> 
>>>>>>>>> I've been trying to debug an issue (I see it on 64-bit linux) where, 
>>>>>>>>> I do "target create" and "process launch" and despite not requesting 
>>>>>>>>> *stop at entry*, the first stop (which I believe is just the initial 
>>>>>>>>> ptrace attach stop) is reported to the lldb command line. I added 
>>>>>>>>> some fprintf to Process::HandlePrivateEvent, which counts the number 
>>>>>>>>> of eStoppedState events seen and whether ShouldBroadcastEvent returns 
>>>>>>>>> true for this event. Here's the output from my program with 
>>>>>>>>> diagnostic:
>>>>>>>>> 
>>>>>>>>> (lldb) target create ~/me/i64-hello.elf
>>>>>>>>> Current executable set to '~/me/i64-hello.elf' (x86_64).
>>>>>>>>> (lldb) process launch
>>>>>>>>> MG Process::HandlePrivateEvent launching stopped_count 0 
>>>>>>>>> should_broadcast 1
>>>>>>>>> Process 31393 launching
>>>>>>>>> MG Process::HandlePrivateEvent stopped stopped_count 1 
>>>>>>>>> should_broadcast 1
>>>>>>>>> MG Process::HandlePrivateEvent running stopped_count 1 
>>>>>>>>> should_broadcast 1
>>>>>>>>> Process 31393 launched: 'i64-hello.elf' (x86_64)
>>>>>>>>> Process 31393 stopped
>>>>>>>>> * thread #1: tid = 31393, 0x0000003675a011f0, name = 'i64-hello.elf', 
>>>>>>>>> stop reason = trace
>>>>>>>>>    frame #0: 0x0000003675a011f0
>>>>>>>>> -> 0x3675a011f0:  movq   %rsp, %rdi
>>>>>>>>>   0x3675a011f3:  callq  0x3675a046e0
>>>>>>>>>   0x3675a011f8:  movq   %rax, %r12
>>>>>>>>>   0x3675a011fb:  movl   0x21eb97(%rip), %eax
>>>>>>>>> (lldb) MG Process::HandlePrivateEvent stopped stopped_count 2 
>>>>>>>>> should_broadcast 0
>>>>>>>>> MG Process::HandlePrivateEvent running stopped_count 2 
>>>>>>>>> should_broadcast 0
>>>>>>>>> MG Process::HandlePrivateEvent stopped stopped_count 3 
>>>>>>>>> should_broadcast 0
>>>>>>>>> MG Process::HandlePrivateEvent running stopped_count 3 
>>>>>>>>> should_broadcast 0
>>>>>>>>> 
>>>>>>>>> In summary, lldb reports the inferior to be stopped (even though 
>>>>>>>>> /proc/pid/status and lldb "target list" say it is running). Clearly 
>>>>>>>>> this is wrong (hence my earlier post).
>>>>>>>>> 
>>>>>>>>> Am I correct in assuming that when  ShouldBroadcastEvent returns true 
>>>>>>>>> this means that lldb should show this event to the debug user? (And 
>>>>>>>>> thus hide other events where ShouldBroadcastEvent=false).
>>>>>>>>> 
>>>>>>>>> What puzzled me was why ShouldBroadcastEvent return true for this 
>>>>>>>>> very first stop. Is this a bug?
>>>>>>>>> 
>>>>>>>>> I also spent sometime at ShouldBroadcastEvent and saw that this:
>>>>>>>>> 
>>>>>>>>>        case eStateStopped:
>>>>>>>>>        case eStateCrashed:
>>>>>>>>>        case eStateSuspended:
>>>>>>>>>        {
>>>>>>>>>         ....
>>>>>>>>>                if (was_restarted || should_resume || 
>>>>>>>>> m_resume_requested)
>>>>>>>>>                {
>>>>>>>>> 
>>>>>>>>> evaluates as false, and hence the PrivateResume code is not called... 
>>>>>>>>> does this seem buggy to you for this very first stop?
>>>>>>>>> 
>>>>>>>>> I thought I'd try asking you, since in a previous mail from Greg, he 
>>>>>>>>> cited you as being a thread-plan expert. (Hope that's ok!). I'd 
>>>>>>>>> really appreciate your help in clarifying the above questions for me, 
>>>>>>>>> and if you have time, giving me some ideas as to how to trace this 
>>>>>>>>> one further e.g. how m_thread_list.ShouldStop and 
>>>>>>>>> m_thread_list.ShouldReportStop should behave, etc.
>>>>>>>>> 
>>>>>>>>> thanks for your help
>>>>>>>>> Matt
>>>>>>>>> 
>>>>>>>>> Matthew Gardiner wrote:
>>>>>>>>>> Folks,
>>>>>>>>>> 
>>>>>>>>>> In addition to the overlapping prompt race Shawn Best and myself are 
>>>>>>>>>> looking at, I'm seeing another issue where if I launch a process, I 
>>>>>>>>>> get a stop (presumably the in) being reported to the UI.
>>>>>>>>>> 
>>>>>>>>>> (lldb) target create ~/mydir/i64-hello.elf
>>>>>>>>>> Current executable set to '~/mydir/i64-hello.elf' (x86_64).
>>>>>>>>>> (lldb) process launch
>>>>>>>>>> Process 27238 launching
>>>>>>>>>> Process 27238 launched: '64-hello.elf' (x86_64)
>>>>>>>>>> Process 27238 stopped
>>>>>>>>>> * thread #1: tid = 27238, 0x0000003675a011f0, name = 'i64-hello.elf'
>>>>>>>>>>    frame #0:
>>>>>>>>>> (lldb) target list
>>>>>>>>>> Current targets:
>>>>>>>>>> * target #0: i64-hello.elf ( arch=x86_64-unknown-linux, 
>>>>>>>>>> platform=host, pid=27238, state=running )
>>>>>>>>>> (lldb)
>>>>>>>>>> 
>>>>>>>>>> As you can see the "target list" reflects that the process is 
>>>>>>>>>> running. Which I confirmed by looking at /proc/27238/status.
>>>>>>>>>> 
>>>>>>>>>> Anyone else seeing this?
>>>>>>>>>> 
>>>>>>>>>> thanks
>>>>>>>>>> Matt
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Member of the CSR plc group of companies. CSR plc registered in 
>>>>>>>>>> England and Wales, registered number 4187346, registered office 
>>>>>>>>>> Churchill House, Cambridge Business Park, Cowley Road, Cambridge, 
>>>>>>>>>> CB4 0WZ, United Kingdom
>>>>>>>>>> More information can be found at www.csr.com <http://www.csr.com>. 
>>>>>>>>>> Keep up to date with CSR on our technical blog, www.csr.com/blog 
>>>>>>>>>> <http://www.csr.com/blog>, CSR people blog, www.csr.com/people 
>>>>>>>>>> <http://www.csr.com/people>, YouTube, www.youtube.com/user/CSRplc 
>>>>>>>>>> <http://www.youtube.com/user/CSRplc>, Facebook, 
>>>>>>>>>> www.facebook.com/pages/CSR/191038434253534 
>>>>>>>>>> <http://www.facebook.com/pages/CSR/191038434253534>, or follow us on 
>>>>>>>>>> Twitter at www.twitter.com/CSR_plc <http://www.twitter.com/CSR_plc>.
>>>>>>>>>> New for 2014, you can now access the wide range of products powered 
>>>>>>>>>> by aptX at www.aptx.com <http://www.aptx.com>.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> lldb-dev mailing list
>>>>>>>>>> lldb-dev@cs.uiuc.edu <mailto:lldb-dev@cs.uiuc.edu>
>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> To report this email as spam click 
>>>>>>>>>> https://www.mailcontrol.com/sr/EjKNgqvIx0TGX2PQPOmvUj!GOBh06pKKNwnTW0ZqkNYNbZeofOurgZMo6Cl2EgPiaCw7kl6fPUTCXaTERp6oIw==
>>>>>>>>>>  
>>>>>>>>>> <https://www.mailcontrol.com/sr/EjKNgqvIx0TGX2PQPOmvUj%21GOBh06pKKNwnTW0ZqkNYNbZeofOurgZMo6Cl2EgPiaCw7kl6fPUTCXaTERp6oIw==>
>>>>>>>>>>  .
>>>>>>>>> _______________________________________________
>>>>>>>>> lldb-dev mailing list
>>>>>>>>> lldb-dev@cs.uiuc.edu <mailto:lldb-dev@cs.uiuc.edu>
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev@cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> 
> <mg_launch.diff>


_______________________________________________
lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Re: [lldb-dev] More linux process control and IOHandler races

Reply via email to