On Thursday, October 6, 2016 12:55:52 PM CEST Robert Schöne wrote: > Hello, > > Could it be that unwinding does not work well with threading? > > I run an Intel dual core system + Hyperthreading using Ubuntu 16.04. > and patched tests/Gperf-trace.c so that this part
<snip> I'm the author of heaptrack and have seen the dwarf-based unwinding adding a significant slow-down when profiling multi-threaded applications. The reason is mostly the synchronization point within the many calls to `dl_iterate_phdr` when encountering non-cached code locations. Once everything is cached, libunwind is pretty fast and scales OK across threads. I have submitted a patch which did not get accepted upstream yet (the project is pretty much unmaintained atm), to improve the per-thread caching functionality. Others have submitted patches to allow replacing `dl_iterate_phdr` with something custom, which allows one to cache the `dl_iterate_phdr` results once and only update that cache when dlclose/dlopen is called. > According to perf and strace a significant amount of time is spent in > the kernel, i.e. in sigprocmask. Can you verify where sigprocmask is coming from, i.e. sample with call stacks? I remember it being a problem once, but don't think it's the main culprit for thread scaling. Unrelated to this: At this stage, I would recommend looking at an alternative to libunwind. elfutils' libdwfl can unwind the stack, and is supposedly even faster at it. You have to write more code though, but you can also implement the address lookups manually, invalidating all of the points above. For inspiration on how to do that, look at the backward-cpp sources: https://github.com/bombela/backward-cpp Cheers -- Milian Wolff [email protected] http://milianw.de _______________________________________________ Libunwind-devel mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/libunwind-devel
