Re: [lldb-dev] strange behaviour at lldb cmd line

Todd Fiala Sun, 29 Jun 2014 00:23:26 -0700

> One of the expedited registers (well, the PC) has the little-endian value
for the address I see listed in the T response to the $? on startup.


Meaning, for the spurious disassembly on "(lldb) run", e.g.

> thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason =
trace

from above, the address of the stop is in fact the PC from the expedited
registers in the first T response to the $? query.


On Sun, Jun 29, 2014 at 12:18 AM, Todd Fiala <todd.fi...@gmail.com> wrote:

> Er...
>
> Ok - so that's not quite right.
>
> In the llgs case, there is an initial probe with a $? command that asks
> where things are at after the inferior is launched.  One of the expedited
> registers (well, the PC) has the little-endian value for the address I see
> listed in the T response to the $? on startup.  I'm going to look and see
> if the $? is really supposed to respond with a T on startup.  I've noticed
> (and adhere to) not responding with a T on launch for the initial stop.  It
> might be that $? should likewise not respond with anything there.
>
> So this still might be a different thing on llgs than on local Linux.
>
> -Todd
>
>
> On Sun, Jun 29, 2014 at 12:11 AM, Todd Fiala <todd.fi...@gmail.com> wrote:
>
>> Here's another interesting bit.  I have been combing through the
>> NativeProcessLinux code in the llgs branch as I tighten everything up and
>> work through known issues.
>>
>> So for this part Shawn called out above:
>>
>> (lldb) run
>>
>> Process 4417 launching
>> Process 4417 stopped
>> * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason
>> = trace
>>     frame #0: 0x00007f3b99b9c2d0
>> -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
>>    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
>>    0x7f3b99b9c2d8:  movq   %rax, %r12
>>    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
>>
>> Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
>>
>>
>> I have been seeing this in lldb connected to llgs as well as on local
>> debugging.  It happens sometimes.  However, on lldb <=> llgs, I have
>> complete view of the protocol (the gdb-remote packets), and I can see that
>> the disassembled stop point is not ever coming up in the gdb-remote log as
>> a stop notification (a T).  I am starting to suspect that there might be
>> something in the stack unwinding, symbolication or something else that is
>> perhaps racey or maybe sensitive to taking too long to perform some step.
>>  The address listed in the llgs case is close to some addresses that are
>> getting probed on the gdb-remote protocol.
>>
>> I'm still tracking this down in the llgs case and I'll double check to
>> make sure I'm not somehow misreading the gdb-remote packet.  More on this
>> later if I find anything useful.
>>
>> -Todd
>>
>>
>>
>> On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <tfi...@google.com> wrote:
>>
>>> > This stopping at random locations seems like a racy bug in the
>>> ProcessLinux that we should really look into fixing.
>>>
>>> (Just doing some correlation with the llgs NativeProcessLinux code,
>>> which is from the same lineage as the Linux ProcessMonitor but has diverged
>>> somewhat).
>>>
>>> Some of these I have hit as well in the NativeProcessLinux code.  I
>>> happen to still see this one (the "sometimes dump stack info on startup,
>>> then go forward"), when creating a process on lldb over to llgs.  No doubt
>>> the same source since NativeProcessLinux started from the same code.
>>>
>>> > This is bad, please use native types (::pid_t) for these locations so
>>> that this works correctly.
>>>
>>> I've fixed most of those (I think all but I'm going to double chekc) as
>>> I got warnings on them in the NativeProcessLinux side, but I'll double
>>> check on that.
>>>
>>> > 1 - on linux does a process start running first, then you quickly try
>>> to attach to it? If so, this could explain the difference you might be
>>> seeing when connecting to a process? On Darwin, our posix_spawn() has a non
>>> portable flag that stops the process at the entry point with a SIGSTOP so
>>> we are guaranteed to not have a race condition when we launch a process for
>>> debugging.
>>>
>>> We have complete control of the launch - we fork a process, do an
>>> execve(), and the monitor gets a trap on the execve(), so shouldn't be a
>>> race.  This is the case for both local Linux debugging and
>>> NativeProcessLinux/llgs.
>>>
>>> I've started diverging on signal handling fairly significantly in
>>> NativeProcessLinux, so it's possible some of the behavior will be different
>>> in some spots.  I have llgs working in cases where local Linux debugging is
>>> failing on my end now.
>>>
>>> Shawn - make sure you look at these type of bugs in the llgs branch with
>>> an llgs - I'd rather you put effort into NativeProcessLinux rather than
>>> Linux ProcessMonitor.
>>>
>>> For llgs, I do have the GDBRemoteCommunicationServer setup to be a
>>> delegate to get messages like process stops.  It gets installed as the
>>> delegate before llgs actually does the launch or attach call so that
>>> there's no chance of the delegate missing a startup event.  We must be
>>> doing something else funky (probably the same but possibly different root
>>> cause) on both the local Linux and NativeProcessLinux startup sequence to
>>> cause that race.
>>>
>>> > 2 - The messages coming in out of order seem to be related to sending
>>> the eStateLaunching and eStateStopped not being delivered in the correct
>>> order. Your first example, they came through OK, and in the second cased we
>>> got a eStateStopped first followed by the eStateLaunching. I would take a
>>> look at who is sending these out of order. If you fix this out of order
>>> events, it might fix your random stopping at an wrong location?
>>>
>>> As far as llgs goes, I do ensure the delegate
>>> (GDBRemoteCommunicationServer) gets those two in order, but at the moment I
>>> don't have llgs doing anything when it gets an eStateLaunching method.  Not
>>> sure if that's relevant there.  What does debugserver do when it gets an
>>> eStateLaunching internally?
>>>
>>>
>>> On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclay...@apple.com>
>>> wrote:
>>>
>>>>
>>>> > On Jun 26, 2014, at 4:51 PM, Shawn Best <sb...@blueshiftinc.com>
>>>> wrote:
>>>> >
>>>> > In addition to the (lldb) prompt out of order, I am also
>>>> investigating some other strange messages when I run a simple application
>>>> with no breakpoints.  It seems related to thread synchronization
>>>> surrounding the startup/management of the inferior process.
>>>> >
>>>> > (lldb) run
>>>> >
>>>> > Process 4417 launching
>>>> > Process 4417 stopped
>>>> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop
>>>> reason = trace
>>>> >     frame #0: 0x00007f3b99b9c2d0
>>>> > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
>>>> >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
>>>> >    0x7f3b99b9c2d8:  movq   %rax, %r12
>>>> >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
>>>> >
>>>> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
>>>> > Hello world!
>>>> > The string is  Test String :  5
>>>> > Process 4417 exited with status = 0 (0x00000000)
>>>> > (lldb)
>>>> >
>>>> > ------------- or ----------------
>>>> >
>>>> > (lldb) run
>>>> >
>>>> > Process 4454 launching
>>>> > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
>>>> > Process 4454 stopped
>>>> > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop
>>>> reason = trace
>>>> >     frame #0: 0x00007ffdec16c2d0
>>>> > error: No such process
>>>> >
>>>> > Hello world!
>>>> > The string is  Test String :  5
>>>> > Process 4454 exited with status = 0 (0x00000000)
>>>> > (lldb)
>>>> >
>>>> >
>>>> > As it is launching the target application, it appears to stop in a
>>>> random place (stop reason = trace), and then continue exectuting. When it
>>>> momentarily stops, I see it pop/push an IOHandler.
>>>>
>>>> Yes the Process IO Handler is pushed and popped on every _public_ stop.
>>>> There are notions of public stops that the user finds out about, and
>>>> private stops where the Process might be in the process of trying to single
>>>> step over a source line and might start/stop the process many many times.
>>>>
>>>> This stopping at random locations seems like a racy bug in the
>>>> ProcessLinux that we should really look into fixing.
>>>> >
>>>> > I added some logging to ProcessPOSIX, and see it hitting
>>>> RefreshAfterStop() and DoResume() many times. Is this normal/expected?
>>>>
>>>> When you start a process, you will run/stop many times as the shared
>>>> libraries get loaded. Normally a breakpoint is set in the dynamic loader
>>>> that allows us to intercept when shared libraries are loaded/unloaded so
>>>> that may explain a few stops you are seeing.
>>>>
>>>> Other run/stop flurries can result when single stepping over a source
>>>> line, stepping past a software breakpoint (disable bp, single instruction
>>>> step, re-enable breakpoint, resume).
>>>>
>>>> > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate,
>>>> HandleProcessEvent and see big differences in the order of events changing
>>>> from run to run.
>>>>
>>>> We have a lot of threading in LLDB so some of this will be normal, but
>>>> other times in can indicate a bug much like you are seeing when the process
>>>> stops at a random location 0x00007ffdec16c2d0. This could also be an
>>>> uninitialized variable in ProcessLinux that gets a random value when
>>>> ProcessLinux (or many other classes like ThreadLinux, etc) when a class
>>>> instance is initialized. Please do try and track that down. To get a handle
>>>> on process controls you can enable process and step logging:
>>>>
>>>> (lldb) log enable -T -f /tmp/process.txt lldb process step
>>>>
>>>> Then compare a good and bad run and see what differs.
>>>>
>>>> >
>>>> >
>>>> > One other small thing, in POSIX/ProcessMonitor, it calls waitpid()
>>>> and checks the return code,
>>>> >
>>>> >     lldb::pid_t wpid;
>>>> >     if ((wpid = waitpid(pid, &status, 0)) < 0)
>>>> >     {
>>>> >         args->m_error.SetErrorToErrno();
>>>> >         goto FINISH;
>>>> >     }
>>>> >     else  ...
>>>> >
>>>> > lldb::pid_t is a uint64, while waitpid returns an int32, with
>>>> negative numbers used for error codes.
>>>> > This bug is repeated in a few places
>>>>
>>>> This is bad, please use native types (::pid_t) for these locations so
>>>> that this works correctly.
>>>>
>>>> So a few things regarding your race conditions:
>>>> 1 - on linux does a process start running first, then you quickly try
>>>> to attach to it? If so, this could explain the difference you might be
>>>> seeing when connecting to a process? On Darwin, our posix_spawn() has a non
>>>> portable flag that stops the process at the entry point with a SIGSTOP so
>>>> we are guaranteed to not have a race condition when we launch a process for
>>>> debugging.
>>>> 2 - The messages coming in out of order seem to be related to sending
>>>> the eStateLaunching and eStateStopped not being delivered in the correct
>>>> order. Your first example, they came through OK, and in the second cased we
>>>> got a eStateStopped first followed by the eStateLaunching. I would take a
>>>> look at who is sending these out of order. If you fix this out of order
>>>> events, it might fix your random stopping at an wrong location?
>>>>
>>>> Greg
>>>>
>>>> >
>>>> > Shawn.
>>>> >
>>>> > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclay...@apple.com>
>>>> wrote:
>>>> >
>>>> > > On Jun 19, 2014, at 7:27 PM, Ed Maste <ema...@freebsd.org> wrote:
>>>> > >
>>>> > > Hi Greg,
>>>> > >
>>>> > > As far as I can tell what's happening here is just that
>>>> > > Process::Resume() completes and the next prompt is emitted (from the
>>>> > > main-thread?) before the IOHandler gets pushed in another thread.
>>>> > >
>>>> > > Output from "log enable -n lldb process" with an added log printf
>>>> > > where ::Resume returns:
>>>> > >
>>>> > > step
>>>> > > main-thread Process::Resume -- locking run lock
>>>> > > main-thread Process::PrivateResume() m_stop_id = 4, public state:
>>>> > > stopped private state: stopped
>>>> > > main-thread Process::SetPrivateState (running)
>>>> > > main-thread Process thinks the process has resumed.
>>>> > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
>>>> > > state: running, last broadcast state: running - YES
>>>> > > main-thread Process::PrivateResume() returning
>>>> > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
>>>> > > broadcasting new state running (old state stopped) to public
>>>> > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
>>>> > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f
>>>> > > (STOPPED), signal = 5, exit_state = 0
>>>> > > internal-state(p PushIOHandler
>>>> > > wait4(pid=15646) Process::SetPrivateState (stopped)
>>>> > >
>>>> > > As before, I don't see how we intend to enforce synchronization
>>>> > > between those two threads.  It looks like my tiny usleep in
>>>> > > ::PrivateResume delays the next prompt just long enough for the
>>>> other
>>>> > > IOHandler to be pushed.
>>>> >
>>>> > That will do it. It is tough because Process::Resume() might not
>>>> succeed so we can't always push the ProcessIOHandler.
>>>> >
>>>> > I need to find a better way to coordinate the pushing of the
>>>> ProcessIOHandler so it happens from the same thread that initiates the
>>>> resume. Then we won't have this issue, but I need to carefully do this so
>>>> it doesn't push it when the process won't be resumed (since it might
>>>> already be resumed) or in other edge cases.
>>>> >
>>>> > Other ideas would be to have the Process::Resume() do some
>>>> synchronization between the current thread and the internal-state thread so
>>>> it waits for the internal-state thread to get to the running state before
>>>> it returns from Process::Resume()...
>>>> >
>>>> > Greg
>>>> >
>>>> >
>>>>
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev@cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>
>>>
>>>
>>>
>>> --
>>>  Todd Fiala | Software Engineer |  tfi...@google.com |  650-943-3180
>>>
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev@cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>
>>>
>>
>>
>> --
>> -Todd
>>
>
>
>
> --
> -Todd
>



-- 
-Todd

_______________________________________________
lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Re: [lldb-dev] strange behaviour at lldb cmd line

Reply via email to