Re: [lldb-dev] Locking issues on windows

Greg Clayton Thu, 18 Apr 2013 09:49:20 -0700


On Apr 17, 2013, at 8:31 PM, "Thirumurthi, Ashok" <[email protected]> 
wrote:


> > Are there packages to install for a suitable version of clang?
>  
> FYI Greg, you can download a freshly built packages for Debian or Ubuntu 
> using the instructions at http://llvm.org/apt/.  Basically, for Ubuntu 12.04, 
> you can just add the following line to /etc/apt/sources.list:
>       deb http://llvm.org/apt/precise/ llvm-toolchain-precise main
> and then run
>       sudo apt-get install clang-3.3

Great, I will try this today.
>  
> That should give you a tool-chain to build llvm, clang and lldb from source.  
> Thanks in advance for all your efforts to get a Linux machine in operation.  
> Say, is this something that you plan to include in pre-commit testing when 
> operational?

Not until we get a real machine we can use. Parallels desktop emulated linux is 
just too slow right now, but I am going to try and get a dedicated machine for 
our group.

>  
> > could you please revert 179329 until we have something that allows us to 
> > run the tests?
>  
> With your fixes in r179378, the test suite runs to completion on one of my 
> test machines.  However, both lldb buildbots for Linux continue to timeout 
> when running the test suite.  The buildbots are clearly helpful to identify 
> the commits that introduce new regressions.  We’ll look into the buildbots 
> more tomorrow. 
>  
> Let us know if you have any concerns with reverting the two lock-related 
> commits (if needed) until we have something more stable.
>  

No worries, this needed to be done to get things rolling again.

I am going to put the 179329 back in, but I will #ifdef __APPLE__ it out so it 
doesn't affect other platforms until we can get the kinks worked out. I will 
add more DEBUG build assertions to make sure that no errors are returned from 
these functions to hopefully get all of the ref counting done correctly.

> Thanks!
>  
> - Ashok
>  
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Greg Clayton
> Sent: Wednesday, April 17, 2013 8:26 PM
> To: Malea, Daniel
> Cc: [email protected]
> Subject: Re: [lldb-dev] Locking issues on windows
>  
>  
> On Apr 17, 2013, at 4:01 PM, "Malea, Daniel" <[email protected]> wrote:
>  
> > So, it looks like the locks are going awry in several places.
> >
> > Carlo, I can confirm that your fix resolves some of the hangs that
> > everyone is experiencing but not all. Specifically, the
> > TestInlineStepping.py seems to still deadlock on the acquisition of
> > one of the Process (public) locks during a Resume(). That said,
> > toggling the lock in the constructor doesn't seem like a sound workaround..
>  
> Agreed, this shouldn't be the fix we use. We should track when we are doing 
> an attach and lock it when the attach starts.
>  
> > Greg,
> >
> > 179329 is the commit that seems to have made things go all sideways.
> > After that commit, no Debian users can install a package that doesn't
> > deadlock on startup, we have no visibility into the testing status on
> > the buildbots, and the commit itself seems scary as it exposes a
> > reference to one of two internal locks to users based on what thread 
> > they're running in.
> >
> > After briefly studying the Process class, I'm a little worried about
> > the complexity of the design. Could you explain the reason 2 different
> > R/W locks are needed? I understand why one R/W lock makes sense in the
> > class, but two seem overly complicated.
>  
> We currently need to avoid doing things while the process is running. There 
> are two cases we care about:
> - the public state tracking when we are running
> - the private state tracking when we are running
>  
> The main reason we need this is the private process state thread handles some 
> complex things for us when it is handling the process. One example is the 
> OperatingSystemPlugins (like OperatingSystemPython) where it may get called 
> from the private process state thread to update the thread list. A common 
> thing to do in the OperatingSystemPython is to read a global list in the 
> kernel that contains the thread list and follow a linked list. If we run and 
> need to determine if we should stop, we often need to update our thread list. 
> This update will happen on the private process thread. So the flow goes like 
> this:
>  
> The old problem was:
>  
> 1 - (main thread) user says "step over"
> 2 - (main thread) initiates the process control and the public process write 
> lock is taken
> 3 - (private process thread) run and stop after each "trace" while doing the 
> single step
> 4 - (private process thread) updates the thread list which calls into the 
> OperatingSystemPython which wants to use the public LLDB API
> 5 - (private process thread) goto 3 until step is done
>  
> The problem is step 4 fails because the OperatingSystemPython used lldb::SB 
> API's that require the public process write lock in order to evaluate 
> expressions and use anything that requires that the process is stopped.
>  
> To get around this we introduced the private read/write process lock to track 
> when the process state thread is stopped so we can actually use the public 
> APIs. So the flow is now:
>  
> 1 - (main thread) user says "step over"
> 2 - (main thread) initiates the process control and the public process write 
> lock is taken
> 3 - (private process thread) lock private process write lock
> 4 - (private process thread) run and stop after each "trace" while doing the 
> single step
> 5 - (private process thread) unlock private process write lock
> 6 - (private process thread) updates the thread list which calls into the 
> OperatingSystemPython which wants to use the public LLDB API
> 7 - (private process thread) goto 3 until the step is done
>  
> This lets us use the public APIs by allowing the private process state thread 
> to lock a different lock and manage when the private state thread is locked.
>  
> This is a problem for other things that use python during the lifetime of the 
> process. For instance, we want to eventually have some python code that gets 
> called when a process is about the resume, or just after it stops. We would 
> like to simplify the code for breakpoints that have commands that get run 
> when the breakpoint is hit (right now we defer any actions until the user 
> consumes the public stop event).
>  
>  
> > You mentioned that you'll improve the R/W (scoped?) locking classes..
> > Any reason to not use boost (or some other C++11 library) for this? If
> > we do have to roll our own in LLDB, the lack of tests is worrisome.
>  
> I am not a big fan of boost as it bloats the C++ program debug info to be so 
> large that it often makes debugging the boost programs very difficult due to 
> the shear size of the debug info. Most of what we cared about from boost is 
> now in C++11. Even if we did use boost, would it actually check to see if the 
> lock was taken prior to trying to release it? The APIs on read/write locks 
> are dead simple, so I don't see this is a reason to use boost.
>  
> > If the improvements to the R/W locker classes you've got in progress
> > don't allow the test suite to run to completion, could you please
> > revert 179329 until we have something that allows us to run the tests?
> > Lots of patches are backed up atm due to the LLVM practice of not
> > committing on top of a broken trunk.
>  
> Yes, I am trying to get us access to a linux machine that we can all use here 
> at Apple so we can debug and fix the things we break.
>  
> I spent a large part of the weekend trying to get Ubuntu 12.04 (using 
> Parallels Desktop (virtualization software)) building llvm/clang/lldb so that 
> I can fix these issues. I wasn't able to get clang to build as the link stage 
> would always get killed with a signal 9. Not sure why, maybe the 
> virtualization software was running out of RAM or resources. The build 
> instructions up on the web for Linux don't actually work on a fresh install 
> of Ubuntu. I needed to install new packages for tools essentials and also 
> install gcc-4.7 and try to figure out how to get LLVM to use these compilers 
> to get things to build with C++11, otherwise the build wouldn't even 
> configure with gcc-4.6 due to the --enable-libcpp quickly stating of of the 
> options wasn't supported by the compiler.
>  
> So the linux builds are frustrating to try and get working, but I do want 
> everyone to know that I am trying.
>  
> What compiler do you build with on linux? Are there packages to install for a 
> suitable version of clang? I finally gave up after many many hours of trying 
> to get lldb to build.
>  
> Greg
>  
> >
> >
> > Dan
> >
> > PS. The hanging buildbots to watch are:
> >
> > http://lab.llvm.org:8011/builders/lldb-x86_64-darwin11/builds/1890
> > http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang
> >
> > http://lab.llvm.org:8011/builders/lldb-x86_64-linux
> >
> >
> > On 2013-04-17 12:47 PM, "Greg Clayton" <[email protected]> wrote:
> >
> >>
> >> On Apr 17, 2013, at 1:27 AM, Carlo Kok <[email protected]> wrote:
> >>
> >>> I'm trying to update the Windows branch to the latest and greatest
> >>> and found these locking issues (not sure if they're relevant for posix 
> >>> too):
> >>>
> >>> When I attach a process (I only use the gdb remote) the first even I
> >>> get is "stopped" which tries to unlock m_private_run_lock, however
> >>> this one is never locked in the first place. Windows' writelock
> >>> doesn't appreciate that; as a workaround I added a
> >>> m_private_run_lock.WriteLock(); in Process' constructor, which seems
> >>> to fix that.
> >>
> >> We need to fix this better by locking the private run lock when
> >> attaching if all goes well.
> >>
> >>>
> >>> The second issue occurs when when trying to cause a "Stop" when it's
> >>> already paused on internal breakpoints; for me this is during slow
> >>> symbol load. When happens is that the loading (which happens from
> >>> within
> >>> Process::ShouldBroadcastEvent) resumes it, then the process exits
> >>> properly (triggers the ShouldBroadcastEvent again) however:
> >>>
> >>> ProcessEventData::DoOnRemoval(lldb_private::Event * event_ptr)
> >>> called by Listener::FindNextEventInternal.
> >>>
> >>> The resume call is in this condition:
> >>> if (state != eStateRunning)
> >>
> >> Where is the above "if (state != eStateRunning)"?
> >>
> >>> Changing that to:
> >>> lldb::StateType state = m_process_sp->GetPrivateState(); if (state
> >>> != eStateRunning && state != eStateCrashed && state !=
> >>> eStateDetached && state != eStateExited)
> >>
> >> There are functions that indicate if the function is stopped or running.
> >> We should use those functions. (search for "StateIsStopped").
> >>
> >>>
> >>> Seems to fix it, as there's no reason to try & resume a process
> >>> that's not running in the first place (and since exiting doesn't
> >>> unlock a process this causes a deadlock)
> >>>
> >>> The last issue is this:
> >>> void * Process::RunPrivateStateThread () does :
> >>> m_public_run_lock.WriteUnlock(); when it's done. The Finalize also
> >>> unlocks that same lock, which Windows crashes on.
> >>> commenting that out and it seems to work stable.
> >>
> >> We need to build in some smarts into our Read/Write locking class to
> >> know if the read/write lock is taken and only unlock if the
> >> corresponding read/write lock is locked. I will make this change today.
> >>
> >>>
> >>>
> >>> Anyone see any issues in all of this? (might make sense to apply
> >>> this to trunk too; it's never good to have unbalanced lock/unlocks)
> >>> _______________________________________________
> >>> lldb-dev mailing list
> >>> [email protected]
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >>
> >> _______________________________________________
> >> lldb-dev mailing list
> >> [email protected]
> >> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >
>  
> _______________________________________________
> lldb-dev mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev


_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Re: [lldb-dev] Locking issues on windows

Reply via email to