Re: [Libunwind-devel] Updated fast trace patch with initial performance results

Paul Pluzhnikov Tue, 05 Apr 2011 16:21:34 -0700

On Tue, Apr 5, 2011 at 2:08 PM, Paul Pluzhnikov <[email protected]> wrote:

> Now all we have to do is figure out how to fix it ;-)

I see a couple of possible solutions:

1. Document the problem; ask users to call backtrace() early (before
   calling pthread_key_create too many times), so it gets one of the
   "pre-allocated" descriptors.

2. Arrange for libunwind __attribute__((constructor)) function to do the
   same, and hope that it fires early enough.

3. Switch to using __thread, figure out some (likely extremely non-portable)
   way to perform cleanup on thread termination.

All of 1, 2 and 3 are non-portable -- there is no guarantee that
pthread_key_create will not *alloc every time it is invoked, nor is pthread*
async-signal-safe.

On Tue, Apr 5, 2011 at 3:13 PM, Lassi Tuura <[email protected]> wrote:

> How far do we want to go in attempting to avoid the one calloc()? :-)
> Choices seem to be:
>  a. Use __thread, require per-thread wrapper callbacks from app

In the context of e.g. malloc stack recorder, application callback is
generally not sufficient.

Consider: application is about to call pthread_exit, so calls libunwind
callback, which frees per-thread cache for current thread. The app then
calls pthread_exit.

Now the fun begins: pthread_exit calls __libc_thread_freeres, which calls
free(), which calls unwinder, which reallocates per-thread trace cache,
which is then leaked.

I think the best you can do is mark per-thread cache that it will likely
become cold soon, and deallocate it some time later (effectively turning
this into B).

>  b. Use lock-free global cache stack, must still free 'unused' caches.
>  c. Use pthread_getspecific, deal with calloc from pthread_key_create,
>    maybe require app to call some init function once at 'safe' time if
>    it uses unw_backtrace?

In general, C has the same problem for a malloc stack recorder: the very
first call to backtrace() may well come from within libc-internal call
to calloc(), and attempt to call pthread_setspecific at that point may
be unsafe, and the app has not even gained execution control yet!

OTOH, for glibc this wouldn't be a problem, as pthread_setspecific will
not call calloc() before 32 TSD keys have been created.

> I guess I'd go with c, b, then a. We can call once to get the key created
> at a safe time (= initialisation for our profiler), then never need to
> worry about destructor calls and don't need per-thread callbacks. Failing
> that I think I'd prefer b over a.

I think the only completely automatic and reasonably portable solution is B,
though it *is* going to a lot of trouble for a problem we don't really have ;-(

How about a variation of C:

4. Require the app to call e.g. libunwind_per_thread_init() from a safe
   context for each thread in which it desires fast backtrace().
   This call will allocate trace cache and do pthread_setspecific.

   In tdep_trace(), if pthread_getspecific() returns NULL, then fall back
   to the slow unwind.

Thanks,
-- 
Paul Pluzhnikov

_______________________________________________
Libunwind-devel mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/libunwind-devel

Re: [Libunwind-devel] Updated fast trace patch with initial performance results

Reply via email to