Er... Ok - so that's not quite right.
In the llgs case, there is an initial probe with a $? command that asks where things are at after the inferior is launched. One of the expedited registers (well, the PC) has the little-endian value for the address I see listed in the T response to the $? on startup. I'm going to look and see if the $? is really supposed to respond with a T on startup. I've noticed (and adhere to) not responding with a T on launch for the initial stop. It might be that $? should likewise not respond with anything there. So this still might be a different thing on llgs than on local Linux. -Todd On Sun, Jun 29, 2014 at 12:11 AM, Todd Fiala <todd.fi...@gmail.com> wrote: > Here's another interesting bit. I have been combing through the > NativeProcessLinux code in the llgs branch as I tighten everything up and > work through known issues. > > So for this part Shawn called out above: > > (lldb) run > > Process 4417 launching > Process 4417 stopped > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason = > trace > frame #0: 0x00007f3b99b9c2d0 > -> 0x7f3b99b9c2d0: movq %rsp, %rdi > 0x7f3b99b9c2d3: callq 0x7f3b99b9fa70 > 0x7f3b99b9c2d8: movq %rax, %r12 > 0x7f3b99b9c2db: movl 0x221b17(%rip), %eax > > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64) > > > I have been seeing this in lldb connected to llgs as well as on local > debugging. It happens sometimes. However, on lldb <=> llgs, I have > complete view of the protocol (the gdb-remote packets), and I can see that > the disassembled stop point is not ever coming up in the gdb-remote log as > a stop notification (a T). I am starting to suspect that there might be > something in the stack unwinding, symbolication or something else that is > perhaps racey or maybe sensitive to taking too long to perform some step. > The address listed in the llgs case is close to some addresses that are > getting probed on the gdb-remote protocol. > > I'm still tracking this down in the llgs case and I'll double check to > make sure I'm not somehow misreading the gdb-remote packet. More on this > later if I find anything useful. > > -Todd > > > > On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <tfi...@google.com> wrote: > >> > This stopping at random locations seems like a racy bug in the >> ProcessLinux that we should really look into fixing. >> >> (Just doing some correlation with the llgs NativeProcessLinux code, which >> is from the same lineage as the Linux ProcessMonitor but has diverged >> somewhat). >> >> Some of these I have hit as well in the NativeProcessLinux code. I >> happen to still see this one (the "sometimes dump stack info on startup, >> then go forward"), when creating a process on lldb over to llgs. No doubt >> the same source since NativeProcessLinux started from the same code. >> >> > This is bad, please use native types (::pid_t) for these locations so >> that this works correctly. >> >> I've fixed most of those (I think all but I'm going to double chekc) as I >> got warnings on them in the NativeProcessLinux side, but I'll double check >> on that. >> >> > 1 - on linux does a process start running first, then you quickly try >> to attach to it? If so, this could explain the difference you might be >> seeing when connecting to a process? On Darwin, our posix_spawn() has a non >> portable flag that stops the process at the entry point with a SIGSTOP so >> we are guaranteed to not have a race condition when we launch a process for >> debugging. >> >> We have complete control of the launch - we fork a process, do an >> execve(), and the monitor gets a trap on the execve(), so shouldn't be a >> race. This is the case for both local Linux debugging and >> NativeProcessLinux/llgs. >> >> I've started diverging on signal handling fairly significantly in >> NativeProcessLinux, so it's possible some of the behavior will be different >> in some spots. I have llgs working in cases where local Linux debugging is >> failing on my end now. >> >> Shawn - make sure you look at these type of bugs in the llgs branch with >> an llgs - I'd rather you put effort into NativeProcessLinux rather than >> Linux ProcessMonitor. >> >> For llgs, I do have the GDBRemoteCommunicationServer setup to be a >> delegate to get messages like process stops. It gets installed as the >> delegate before llgs actually does the launch or attach call so that >> there's no chance of the delegate missing a startup event. We must be >> doing something else funky (probably the same but possibly different root >> cause) on both the local Linux and NativeProcessLinux startup sequence to >> cause that race. >> >> > 2 - The messages coming in out of order seem to be related to sending >> the eStateLaunching and eStateStopped not being delivered in the correct >> order. Your first example, they came through OK, and in the second cased we >> got a eStateStopped first followed by the eStateLaunching. I would take a >> look at who is sending these out of order. If you fix this out of order >> events, it might fix your random stopping at an wrong location? >> >> As far as llgs goes, I do ensure the delegate >> (GDBRemoteCommunicationServer) gets those two in order, but at the moment I >> don't have llgs doing anything when it gets an eStateLaunching method. Not >> sure if that's relevant there. What does debugserver do when it gets an >> eStateLaunching internally? >> >> >> On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclay...@apple.com> wrote: >> >>> >>> > On Jun 26, 2014, at 4:51 PM, Shawn Best <sb...@blueshiftinc.com> >>> wrote: >>> > >>> > In addition to the (lldb) prompt out of order, I am also investigating >>> some other strange messages when I run a simple application with no >>> breakpoints. It seems related to thread synchronization surrounding the >>> startup/management of the inferior process. >>> > >>> > (lldb) run >>> > >>> > Process 4417 launching >>> > Process 4417 stopped >>> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop >>> reason = trace >>> > frame #0: 0x00007f3b99b9c2d0 >>> > -> 0x7f3b99b9c2d0: movq %rsp, %rdi >>> > 0x7f3b99b9c2d3: callq 0x7f3b99b9fa70 >>> > 0x7f3b99b9c2d8: movq %rax, %r12 >>> > 0x7f3b99b9c2db: movl 0x221b17(%rip), %eax >>> > >>> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64) >>> > Hello world! >>> > The string is Test String : 5 >>> > Process 4417 exited with status = 0 (0x00000000) >>> > (lldb) >>> > >>> > ------------- or ---------------- >>> > >>> > (lldb) run >>> > >>> > Process 4454 launching >>> > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64) >>> > Process 4454 stopped >>> > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop >>> reason = trace >>> > frame #0: 0x00007ffdec16c2d0 >>> > error: No such process >>> > >>> > Hello world! >>> > The string is Test String : 5 >>> > Process 4454 exited with status = 0 (0x00000000) >>> > (lldb) >>> > >>> > >>> > As it is launching the target application, it appears to stop in a >>> random place (stop reason = trace), and then continue exectuting. When it >>> momentarily stops, I see it pop/push an IOHandler. >>> >>> Yes the Process IO Handler is pushed and popped on every _public_ stop. >>> There are notions of public stops that the user finds out about, and >>> private stops where the Process might be in the process of trying to single >>> step over a source line and might start/stop the process many many times. >>> >>> This stopping at random locations seems like a racy bug in the >>> ProcessLinux that we should really look into fixing. >>> > >>> > I added some logging to ProcessPOSIX, and see it hitting >>> RefreshAfterStop() and DoResume() many times. Is this normal/expected? >>> >>> When you start a process, you will run/stop many times as the shared >>> libraries get loaded. Normally a breakpoint is set in the dynamic loader >>> that allows us to intercept when shared libraries are loaded/unloaded so >>> that may explain a few stops you are seeing. >>> >>> Other run/stop flurries can result when single stepping over a source >>> line, stepping past a software breakpoint (disable bp, single instruction >>> step, re-enable breakpoint, resume). >>> >>> > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate, >>> HandleProcessEvent and see big differences in the order of events changing >>> from run to run. >>> >>> We have a lot of threading in LLDB so some of this will be normal, but >>> other times in can indicate a bug much like you are seeing when the process >>> stops at a random location 0x00007ffdec16c2d0. This could also be an >>> uninitialized variable in ProcessLinux that gets a random value when >>> ProcessLinux (or many other classes like ThreadLinux, etc) when a class >>> instance is initialized. Please do try and track that down. To get a handle >>> on process controls you can enable process and step logging: >>> >>> (lldb) log enable -T -f /tmp/process.txt lldb process step >>> >>> Then compare a good and bad run and see what differs. >>> >>> > >>> > >>> > One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and >>> checks the return code, >>> > >>> > lldb::pid_t wpid; >>> > if ((wpid = waitpid(pid, &status, 0)) < 0) >>> > { >>> > args->m_error.SetErrorToErrno(); >>> > goto FINISH; >>> > } >>> > else ... >>> > >>> > lldb::pid_t is a uint64, while waitpid returns an int32, with negative >>> numbers used for error codes. >>> > This bug is repeated in a few places >>> >>> This is bad, please use native types (::pid_t) for these locations so >>> that this works correctly. >>> >>> So a few things regarding your race conditions: >>> 1 - on linux does a process start running first, then you quickly try to >>> attach to it? If so, this could explain the difference you might be seeing >>> when connecting to a process? On Darwin, our posix_spawn() has a non >>> portable flag that stops the process at the entry point with a SIGSTOP so >>> we are guaranteed to not have a race condition when we launch a process for >>> debugging. >>> 2 - The messages coming in out of order seem to be related to sending >>> the eStateLaunching and eStateStopped not being delivered in the correct >>> order. Your first example, they came through OK, and in the second cased we >>> got a eStateStopped first followed by the eStateLaunching. I would take a >>> look at who is sending these out of order. If you fix this out of order >>> events, it might fix your random stopping at an wrong location? >>> >>> Greg >>> >>> > >>> > Shawn. >>> > >>> > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclay...@apple.com> >>> wrote: >>> > >>> > > On Jun 19, 2014, at 7:27 PM, Ed Maste <ema...@freebsd.org> wrote: >>> > > >>> > > Hi Greg, >>> > > >>> > > As far as I can tell what's happening here is just that >>> > > Process::Resume() completes and the next prompt is emitted (from the >>> > > main-thread?) before the IOHandler gets pushed in another thread. >>> > > >>> > > Output from "log enable -n lldb process" with an added log printf >>> > > where ::Resume returns: >>> > > >>> > > step >>> > > main-thread Process::Resume -- locking run lock >>> > > main-thread Process::PrivateResume() m_stop_id = 4, public state: >>> > > stopped private state: stopped >>> > > main-thread Process::SetPrivateState (running) >>> > > main-thread Process thinks the process has resumed. >>> > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new >>> > > state: running, last broadcast state: running - YES >>> > > main-thread Process::PrivateResume() returning >>> > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646) >>> > > broadcasting new state running (old state stopped) to public >>> > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid = >>> > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f >>> > > (STOPPED), signal = 5, exit_state = 0 >>> > > internal-state(p PushIOHandler >>> > > wait4(pid=15646) Process::SetPrivateState (stopped) >>> > > >>> > > As before, I don't see how we intend to enforce synchronization >>> > > between those two threads. It looks like my tiny usleep in >>> > > ::PrivateResume delays the next prompt just long enough for the other >>> > > IOHandler to be pushed. >>> > >>> > That will do it. It is tough because Process::Resume() might not >>> succeed so we can't always push the ProcessIOHandler. >>> > >>> > I need to find a better way to coordinate the pushing of the >>> ProcessIOHandler so it happens from the same thread that initiates the >>> resume. Then we won't have this issue, but I need to carefully do this so >>> it doesn't push it when the process won't be resumed (since it might >>> already be resumed) or in other edge cases. >>> > >>> > Other ideas would be to have the Process::Resume() do some >>> synchronization between the current thread and the internal-state thread so >>> it waits for the internal-state thread to get to the running state before >>> it returns from Process::Resume()... >>> > >>> > Greg >>> > >>> > >>> >>> _______________________________________________ >>> lldb-dev mailing list >>> lldb-dev@cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev >>> >> >> >> >> -- >> Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180 >> >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev >> >> > > > -- > -Todd > -- -Todd
_______________________________________________ lldb-dev mailing list lldb-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev