Re: [lldb-dev] strange behaviour at lldb cmd line

Todd Fiala Sun, 29 Jun 2014 00:21:07 -0700

Er...

Ok - so that's not quite right.


In the llgs case, there is an initial probe with a $? command that asks
where things are at after the inferior is launched.  One of the expedited
registers (well, the PC) has the little-endian value for the address I see
listed in the T response to the $? on startup.  I'm going to look and see
if the $? is really supposed to respond with a T on startup.  I've noticed
(and adhere to) not responding with a T on launch for the initial stop.  It
might be that $? should likewise not respond with anything there.

So this still might be a different thing on llgs than on local Linux.

-Todd


On Sun, Jun 29, 2014 at 12:11 AM, Todd Fiala <todd.fi...@gmail.com> wrote:

> Here's another interesting bit.  I have been combing through the
> NativeProcessLinux code in the llgs branch as I tighten everything up and
> work through known issues.
>
> So for this part Shawn called out above:
>
> (lldb) run
>
> Process 4417 launching
> Process 4417 stopped
> * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason =
> trace
>     frame #0: 0x00007f3b99b9c2d0
> -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
>    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
>    0x7f3b99b9c2d8:  movq   %rax, %r12
>    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
>
> Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
>
>
> I have been seeing this in lldb connected to llgs as well as on local
> debugging.  It happens sometimes.  However, on lldb <=> llgs, I have
> complete view of the protocol (the gdb-remote packets), and I can see that
> the disassembled stop point is not ever coming up in the gdb-remote log as
> a stop notification (a T).  I am starting to suspect that there might be
> something in the stack unwinding, symbolication or something else that is
> perhaps racey or maybe sensitive to taking too long to perform some step.
>  The address listed in the llgs case is close to some addresses that are
> getting probed on the gdb-remote protocol.
>
> I'm still tracking this down in the llgs case and I'll double check to
> make sure I'm not somehow misreading the gdb-remote packet.  More on this
> later if I find anything useful.
>
> -Todd
>
>
>
> On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <tfi...@google.com> wrote:
>
>> > This stopping at random locations seems like a racy bug in the
>> ProcessLinux that we should really look into fixing.
>>
>> (Just doing some correlation with the llgs NativeProcessLinux code, which
>> is from the same lineage as the Linux ProcessMonitor but has diverged
>> somewhat).
>>
>> Some of these I have hit as well in the NativeProcessLinux code.  I
>> happen to still see this one (the "sometimes dump stack info on startup,
>> then go forward"), when creating a process on lldb over to llgs.  No doubt
>> the same source since NativeProcessLinux started from the same code.
>>
>> > This is bad, please use native types (::pid_t) for these locations so
>> that this works correctly.
>>
>> I've fixed most of those (I think all but I'm going to double chekc) as I
>> got warnings on them in the NativeProcessLinux side, but I'll double check
>> on that.
>>
>> > 1 - on linux does a process start running first, then you quickly try
>> to attach to it? If so, this could explain the difference you might be
>> seeing when connecting to a process? On Darwin, our posix_spawn() has a non
>> portable flag that stops the process at the entry point with a SIGSTOP so
>> we are guaranteed to not have a race condition when we launch a process for
>> debugging.
>>
>> We have complete control of the launch - we fork a process, do an
>> execve(), and the monitor gets a trap on the execve(), so shouldn't be a
>> race.  This is the case for both local Linux debugging and
>> NativeProcessLinux/llgs.
>>
>> I've started diverging on signal handling fairly significantly in
>> NativeProcessLinux, so it's possible some of the behavior will be different
>> in some spots.  I have llgs working in cases where local Linux debugging is
>> failing on my end now.
>>
>> Shawn - make sure you look at these type of bugs in the llgs branch with
>> an llgs - I'd rather you put effort into NativeProcessLinux rather than
>> Linux ProcessMonitor.
>>
>> For llgs, I do have the GDBRemoteCommunicationServer setup to be a
>> delegate to get messages like process stops.  It gets installed as the
>> delegate before llgs actually does the launch or attach call so that
>> there's no chance of the delegate missing a startup event.  We must be
>> doing something else funky (probably the same but possibly different root
>> cause) on both the local Linux and NativeProcessLinux startup sequence to
>> cause that race.
>>
>> > 2 - The messages coming in out of order seem to be related to sending
>> the eStateLaunching and eStateStopped not being delivered in the correct
>> order. Your first example, they came through OK, and in the second cased we
>> got a eStateStopped first followed by the eStateLaunching. I would take a
>> look at who is sending these out of order. If you fix this out of order
>> events, it might fix your random stopping at an wrong location?
>>
>> As far as llgs goes, I do ensure the delegate
>> (GDBRemoteCommunicationServer) gets those two in order, but at the moment I
>> don't have llgs doing anything when it gets an eStateLaunching method.  Not
>> sure if that's relevant there.  What does debugserver do when it gets an
>> eStateLaunching internally?
>>
>>
>> On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclay...@apple.com> wrote:
>>
>>>
>>> > On Jun 26, 2014, at 4:51 PM, Shawn Best <sb...@blueshiftinc.com>
>>> wrote:
>>> >
>>> > In addition to the (lldb) prompt out of order, I am also investigating
>>> some other strange messages when I run a simple application with no
>>> breakpoints.  It seems related to thread synchronization surrounding the
>>> startup/management of the inferior process.
>>> >
>>> > (lldb) run
>>> >
>>> > Process 4417 launching
>>> > Process 4417 stopped
>>> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop
>>> reason = trace
>>> >     frame #0: 0x00007f3b99b9c2d0
>>> > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
>>> >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
>>> >    0x7f3b99b9c2d8:  movq   %rax, %r12
>>> >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
>>> >
>>> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
>>> > Hello world!
>>> > The string is  Test String :  5
>>> > Process 4417 exited with status = 0 (0x00000000)
>>> > (lldb)
>>> >
>>> > ------------- or ----------------
>>> >
>>> > (lldb) run
>>> >
>>> > Process 4454 launching
>>> > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
>>> > Process 4454 stopped
>>> > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop
>>> reason = trace
>>> >     frame #0: 0x00007ffdec16c2d0
>>> > error: No such process
>>> >
>>> > Hello world!
>>> > The string is  Test String :  5
>>> > Process 4454 exited with status = 0 (0x00000000)
>>> > (lldb)
>>> >
>>> >
>>> > As it is launching the target application, it appears to stop in a
>>> random place (stop reason = trace), and then continue exectuting. When it
>>> momentarily stops, I see it pop/push an IOHandler.
>>>
>>> Yes the Process IO Handler is pushed and popped on every _public_ stop.
>>> There are notions of public stops that the user finds out about, and
>>> private stops where the Process might be in the process of trying to single
>>> step over a source line and might start/stop the process many many times.
>>>
>>> This stopping at random locations seems like a racy bug in the
>>> ProcessLinux that we should really look into fixing.
>>> >
>>> > I added some logging to ProcessPOSIX, and see it hitting
>>> RefreshAfterStop() and DoResume() many times. Is this normal/expected?
>>>
>>> When you start a process, you will run/stop many times as the shared
>>> libraries get loaded. Normally a breakpoint is set in the dynamic loader
>>> that allows us to intercept when shared libraries are loaded/unloaded so
>>> that may explain a few stops you are seeing.
>>>
>>> Other run/stop flurries can result when single stepping over a source
>>> line, stepping past a software breakpoint (disable bp, single instruction
>>> step, re-enable breakpoint, resume).
>>>
>>> > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate,
>>> HandleProcessEvent and see big differences in the order of events changing
>>> from run to run.
>>>
>>> We have a lot of threading in LLDB so some of this will be normal, but
>>> other times in can indicate a bug much like you are seeing when the process
>>> stops at a random location 0x00007ffdec16c2d0. This could also be an
>>> uninitialized variable in ProcessLinux that gets a random value when
>>> ProcessLinux (or many other classes like ThreadLinux, etc) when a class
>>> instance is initialized. Please do try and track that down. To get a handle
>>> on process controls you can enable process and step logging:
>>>
>>> (lldb) log enable -T -f /tmp/process.txt lldb process step
>>>
>>> Then compare a good and bad run and see what differs.
>>>
>>> >
>>> >
>>> > One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and
>>> checks the return code,
>>> >
>>> >     lldb::pid_t wpid;
>>> >     if ((wpid = waitpid(pid, &status, 0)) < 0)
>>> >     {
>>> >         args->m_error.SetErrorToErrno();
>>> >         goto FINISH;
>>> >     }
>>> >     else  ...
>>> >
>>> > lldb::pid_t is a uint64, while waitpid returns an int32, with negative
>>> numbers used for error codes.
>>> > This bug is repeated in a few places
>>>
>>> This is bad, please use native types (::pid_t) for these locations so
>>> that this works correctly.
>>>
>>> So a few things regarding your race conditions:
>>> 1 - on linux does a process start running first, then you quickly try to
>>> attach to it? If so, this could explain the difference you might be seeing
>>> when connecting to a process? On Darwin, our posix_spawn() has a non
>>> portable flag that stops the process at the entry point with a SIGSTOP so
>>> we are guaranteed to not have a race condition when we launch a process for
>>> debugging.
>>> 2 - The messages coming in out of order seem to be related to sending
>>> the eStateLaunching and eStateStopped not being delivered in the correct
>>> order. Your first example, they came through OK, and in the second cased we
>>> got a eStateStopped first followed by the eStateLaunching. I would take a
>>> look at who is sending these out of order. If you fix this out of order
>>> events, it might fix your random stopping at an wrong location?
>>>
>>> Greg
>>>
>>> >
>>> > Shawn.
>>> >
>>> > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclay...@apple.com>
>>> wrote:
>>> >
>>> > > On Jun 19, 2014, at 7:27 PM, Ed Maste <ema...@freebsd.org> wrote:
>>> > >
>>> > > Hi Greg,
>>> > >
>>> > > As far as I can tell what's happening here is just that
>>> > > Process::Resume() completes and the next prompt is emitted (from the
>>> > > main-thread?) before the IOHandler gets pushed in another thread.
>>> > >
>>> > > Output from "log enable -n lldb process" with an added log printf
>>> > > where ::Resume returns:
>>> > >
>>> > > step
>>> > > main-thread Process::Resume -- locking run lock
>>> > > main-thread Process::PrivateResume() m_stop_id = 4, public state:
>>> > > stopped private state: stopped
>>> > > main-thread Process::SetPrivateState (running)
>>> > > main-thread Process thinks the process has resumed.
>>> > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
>>> > > state: running, last broadcast state: running - YES
>>> > > main-thread Process::PrivateResume() returning
>>> > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
>>> > > broadcasting new state running (old state stopped) to public
>>> > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
>>> > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f
>>> > > (STOPPED), signal = 5, exit_state = 0
>>> > > internal-state(p PushIOHandler
>>> > > wait4(pid=15646) Process::SetPrivateState (stopped)
>>> > >
>>> > > As before, I don't see how we intend to enforce synchronization
>>> > > between those two threads.  It looks like my tiny usleep in
>>> > > ::PrivateResume delays the next prompt just long enough for the other
>>> > > IOHandler to be pushed.
>>> >
>>> > That will do it. It is tough because Process::Resume() might not
>>> succeed so we can't always push the ProcessIOHandler.
>>> >
>>> > I need to find a better way to coordinate the pushing of the
>>> ProcessIOHandler so it happens from the same thread that initiates the
>>> resume. Then we won't have this issue, but I need to carefully do this so
>>> it doesn't push it when the process won't be resumed (since it might
>>> already be resumed) or in other edge cases.
>>> >
>>> > Other ideas would be to have the Process::Resume() do some
>>> synchronization between the current thread and the internal-state thread so
>>> it waits for the internal-state thread to get to the running state before
>>> it returns from Process::Resume()...
>>> >
>>> > Greg
>>> >
>>> >
>>>
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev@cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>
>>
>>
>>
>> --
>>  Todd Fiala | Software Engineer |  tfi...@google.com |  650-943-3180
>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>
>>
>
>
> --
> -Todd
>



-- 
-Todd

_______________________________________________
lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Re: [lldb-dev] strange behaviour at lldb cmd line

Reply via email to