Hi Milian, On Wed, Mar 21, 2018 at 02:01:41PM +0100, Milian Wolff wrote: > Here's the code for the perf tools: > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/ > perf/util/unwind-libdw.c?h=perf/core#n52 > > Here's the code for the perfparser: > > http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/ > perfsymboltable.cpp#n479 > > Let's concentrate on perf for now, but perfparser has similar logic: > > We parse the mmap events in the perf.data file and store that information. > Note that the perf.data file does not contain events for munmap calls. Then > while unwinding the callstack of a perf sample, we lookup the most recent > mmap > event for every given instruction pointer address, and ensure that the > corresponding ELF was registered with libdw.
So, modules are never deregistered? In that case, that might explain the issue. But I see there is a check if there is already something at the address. The interface to "remove" a module might not be immediately clear. The idea is that if modules need to be remove you'll call dwfl_report_begin, possibly dwfl_report_elf for any new module and then dwfl_report_end has a callback that gets all old modules and decides whether to re-report them, or they'll get removed. You might want to experiment with doing that and not re-report any module that overlaps with the new module. (See the libdwfl.h documentation for a hopefully clearer description.) > > Specifically are you using false for the add_p_vaddr argument? > > Yes, we are. > > > And could you provide some example where the reported address is > > wrong/different from the start address of the Dwfl_Module? > > I don't think it's the start address that is wrong, rather it's the end > address. But it's hard for me to come up with a small selfcontained example > at > this stage. I am regularly seeing broken backtraces for samples where I have > the gut feeling that missing reported ELFs are to blame. But we report > everything, except for scenarios where the mmap events seemingly overlap. > This > overlapping is, as far as I can see, actually a side effect of remapping > taking place in the dynamic linker (i.e. a single dlopen/dynamic linked > library can yield multiple mmap events). One way or another, we end up with a > situation where we cannot report an ELF to dwfl due to two issues: > > a) either ELF tells us we are overlapping some module and just stops which is > bad, since we would actually much prefer the newly reported ELF to take > precedence > > b) we find an mmap event that with a non-zero pgoff, and have no clue how to > call dwfl_report_elf and just give up. > > In both cases, I was hopeing for dwfl_report_module to help since it > seemingly > allows me to exactly recreate the mapping that was traced originally. If you could add some logging and post that plus the eu-readelf -l output of the ELF file, that might help track down what is really going on. Cheers, Mark