Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

Jonathon Anderson Fri, 23 Aug 2019 15:37:40 -0700



On Fri, Aug 23, 2019 at 1:25 PM, Mark Wielaard <m...@klomp.org> wrote:

Hi,

On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote:

 On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard <m...@klomp.org>
 wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote:

> > For parallel applications that need the information in theDIEs, the

 > >  Dwarf_Abbrev hash table et al. become a massive data race. This
 > > fixes
 > >  that by:
 > >

> > 1. Adding atomics & locks to the hash table to manageconcurrency

 > >     (lib/dynamicsizehash_concurrent.{c,h})
 > >  2. Adding a lock & array structure to the memory manager
 > > (pseudo-TLS)
 > >     (libdwP.h, libdw_alloc.c)
 > >  3. Adding extra configure options for Helgrind/DRD annotations
 > >     (configure.ac)
 > >  4. Including "stdatomic.h" from FreeBSD, to support C11-style
 > > atomics.
 > >     (lib/stdatomic.h)
 >
 > This looks like really nice work. Thanks!
 >
 > I am splitting review in some smaller parts if you don't mind.

> Simply because it is large and I cannot keep everything in myhead at

 > once :)

BTW. I would prefer to handle this as 4 separate additions, probablyin

this order:

1) configure stuff for valgrind annotations.
2) add support for stdatomic.h functions.
3) thread-safe obstack memory handling
4) concurrent dynamic hash table.

Sure thing, I can split the patch into bits over the weekend. I maytake your advice and just use git request-pull though.

 > If the compiler provides stdatomic.h then I think it would be good
 > to
 > use that instead of our own implementation. The copyright isn't a
> problem. But do you have a reference/URL to the upstream version?I> like to add that somewhere, so we can sync with it in the future.I
 > see
> various commented out parts. Was that already upstream? Should wejust
 > remove those parts?
It would definitely be preferable to use the compiler'simplementationif possible, we used this in case GCC 4.7 and 4.8 (RHEL7)compatibility
 was needed. If those versions are old enough the file can be removed
 entirely.

 The upstream is at
 https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h,
although the version here appears to be slightly modified (we usedthe
 version that HPCToolkit ships). The components we use don't seem
 affected, so a resync shouldn't make a difference.
OK, then we should come up with some kind of configure test to see if
we can use the standard stdatomic.h and otherwise use our own. I am
surprised I cannot find other projects doing this. Would be nice to
"steal" something standard for this.

At least OpenSSL does it:https://github.com/openssl/openssl/blob/master/include/internal/refcount.h,the interesting note being that it has a series of fallbacks (variouscompiler builtins and then locks). The other projects I skimmed justhave the fallbacks and don't check for C11, given that Elfutils onlysupports GCC that might be a valid (and more compact) approach.

 > >   - Currently the concurrent hash table is always enabled,
 > >     performance-wise there is no known difference between it
 > >     and the non-concurrent  version.
 > >     This can be changed to toggle with --enable-thread-safety
 > >     if preferred.
 >
> I would prefer it always enabled, unless there is a massiveslowdown
 > of
> the single-threaded case. The problem with --enable-thread-safetyis> that it is a) known broken (sigh) and b) it basically introducestwo> separate libraries that behave subtly differently. I would verymuch> like to get rid of --enable-thread-safety by fixing the brokenlocking
 > and simply making it the default.
I haven't noticed any slowdown in the single-threaded case,although I
 haven't stressed it hard enough to find out for certain. From a
 theoretical standpoint it shouldn't, atomics (with the proper memory
orders) are usually (on x86 at least) as cheap as normal accesseswhenused by a single thread, and the algorithm is otherwise effectivelythe
 same as the original hash table.

 How difficult would it be to fix the locking (or rather, what's
 "broken")? We would definitely benefit from having thread-safety at
 least for getters, which would only need locks around the internal
 management.
To be honest I don't know how badly it is broken.
It is only implemented for libelf.
If you configure --enable-thread-safety and make check you will see
several tests fail because they abort with Unexpected error: Resource
deadlock avoided.

I think it is mainly that nobody maintained the locks and now some are
just wrongly placed. Ideally we switch --enable-thread-safety on by
default, identify which locks are wrongly placed, run all tests with
valgrind/hellgrind and fix any issues found.

It really has not been a priority. Sorry.

No worries, its not a priority on our end either. Elfutils' codebase issignificantly simpler (IMHO) than ours, so if it ever comes up we'lljust submit another patch.

 > >   - Another implementation of #2 above might use dynamic TLS
 > >     (pthread_key_*),
> > we chose this implementation to reduce the overallcomplexity.
 >
 > Are there any other trade-offs?
If the application spawns N threads that all use a Dwarf object(sameor different) enough to cause an allocation, and then joins themall,any Dwarf objects allocated after will allocate N unusable slots inthemem_tails array. Thus long-running applications (that don't usethreadpools) would start experiencing effects similar to a memory leak,of 1
 pointer's worth (8 bytes on 64-bit) per dead thread.

 The alternative is to use dynamic TLS so that only threads that are
 currently active use the extra memory, assuming libpthread is
sufficiently proactive about reclaiming unused key values. I thinkifwe assume `dwarf_end` happens-after any memory management (whichwouldmake sense for a destructor), there should be a simple atomicpatternto handle the freeing, but I would need to sit down for a while towork
 out a sufficient proof.

 I was also under the impression that dynamic TLS was particularly
 expensive performance-wise, although I haven't experimented with it
 myself enough to know. The compiler can be a little smarter about
static TLS, and the result is easier to reason about, which is whywechose this implementation for initial patch. If the alternativewould
 be preferable we can change it.
I thought a bit about this one and although I am slightly worriedabout
the possible indefinite growing of the thread_ids, I don't think the
"memory leak" is an issue. Concurrent usage of the Dwarf objectalreadycosts a bit more memory (since each thread gets its own memory blockifthey allocate at the same time) which is probably larger than anyextra
created by reserving space for all possible thread_ids. This is only
really a problem for a program that doesn't use thread pools and keeps
opening and concurrently accessing new Dwarf objects (because at a
certain point the whole Dwarf will have been read and no new
allocations happen). Although it would be nice if we could somehow
reset the next_id to zero in dwarf_end (), when this is the lastthread
or Dwarf object.

The memory overhead is a little worse than that (each thread allocatesits own memory blocks, period), but that would be present in bothimplementations. I can't think of a simple way to increase the memoryefficiency past that (although I can think of some ridiculously complexways).

I suppose it would be possible to use a sort of free-list for the IDs,although that requires a hook at thread exit (doable with PThreads, notwith C11) and cleaning up would be a bit of a nightmare. At some pointdynamic TLS is more robust against weird situations (and, if mythought-proof is correct, simpler).


Cheers,

Mark

Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()

Reply via email to