Hi, Thanks for the answer, but i do not think that this would help me.
After some debugging I found that changing the caching_policy of unw_local_addr_space does not affect the as->caching_policy that is used in dwarf/Gparser.c:get_rs_cache. The functions get_rs_cache and put_rs_cache call lock_acquire and lock_release respectively. These call the syscall sigprocmask which does not scale. If I inititialize the caching with UNW_CACHE_PER_THREAD in x86_64/Ginit.c:x86_64_local_addr_space_init, then the runtime is significantly better. In my local branch, I solved the problem by implementing 2 new functions in mi/init.c that are exposed by the library. With these functions one can get and set the default local caching policy. The default setting can be defined before before any init_local is called and must not be changed after the first init_local call. Robert Am Donnerstag, den 06.10.2016, 18:25 +0200 schrieb Milian Wolff: > On Thursday, October 6, 2016 12:55:52 PM CEST Robert Schöne wrote: > > > > Hello, > > > > Could it be that unwinding does not work well with threading? > > > > I run an Intel dual core system + Hyperthreading using Ubuntu 16.04. > > and patched tests/Gperf-trace.c so that this part > > > I'm the author of heaptrack and have seen the dwarf-based unwinding adding a > significant slow-down when profiling multi-threaded applications. The reason > is mostly the synchronization point within the many calls to > `dl_iterate_phdr` > when encountering non-cached code locations. Once everything is cached, > libunwind is pretty fast and scales OK across threads. > > I have submitted a patch which did not get accepted upstream yet (the project > is pretty much unmaintained atm), to improve the per-thread caching > functionality. > > Others have submitted patches to allow replacing `dl_iterate_phdr` with > something custom, which allows one to cache the `dl_iterate_phdr` results > once > and only update that cache when dlclose/dlopen is called. > > > > > According to perf and strace a significant amount of time is spent in > > the kernel, i.e. in sigprocmask. > Can you verify where sigprocmask is coming from, i.e. sample with call > stacks? > I remember it being a problem once, but don't think it's the main culprit for > thread scaling. > > Unrelated to this: At this stage, I would recommend looking at an alternative > to libunwind. elfutils' libdwfl can unwind the stack, and is supposedly even > faster at it. You have to write more code though, but you can also implement > the address lookups manually, invalidating all of the points above. > > For inspiration on how to do that, look at the backward-cpp sources: > https://github.com/bombela/backward-cpp > > Cheers _______________________________________________ Libunwind-devel mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/libunwind-devel
