On Mittwoch, 21. März 2018 14:01:41 CET Milian Wolff wrote: > On Dienstag, 20. März 2018 23:05:49 CET Mark Wielaard wrote: > > Hi Milian, > > Hey Mark :) > > > On Sat, Mar 17, 2018 at 02:14:48PM +0100, Milian Wolff wrote: > > > a recurring issue in the libdwfl integration of perf and perfparser are > > > supposedly overlapping modules. The perf data file contains the exact > > > mappings for all files corresponding to the actual mmap events that > > > occurred during runtime, e.g.: > > > > > > ``` > > > $ perf script --show-mmap-events | grep MMAP | grep stdc > > > heaptrack_print 13962 87163.483450: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aea84000(0x387000) @ 0 08:03 413039 3864781083]: r-xp > > > /usr/lib/libstdc+ +.so.6.0.24 > > > heaptrack_print 13962 87163.483454: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aebfc000(0x1ff000) @ 0x178000 08:03 413039 3864781083]: ---p > > > /usr/lib/ libstdc++.so.6.0.24 > > > heaptrack_print 13962 87163.483458: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aedfb000(0xd000) @ 0x177000 08:03 413039 3864781083]: rw-p > > > /usr/lib/ > > > libstdc++.so.6.0.24 > > > heaptrack_print 13962 87163.484860: PERF_RECORD_MMAP2 13962/13962: > > > [0x7fd0aedfb000(0xc000) @ 0x177000 08:03 413039 3864781083]: r--p > > > /usr/lib/ > > > libstdc++.so.6.0.24 > > > ``` > > > So far, both perf and perfparser are using dwfl_report_elf to report the > > > file. But that API is deducing the mapping addresses internally, which > > > may or may not be what we saw at runtime. I suspect that this is the > > > reason for some issues we are seeing, such as supposedly overlapping > > > modules. > > > > How exactly are you calling dwfl_report_elf? > > Here's the code for the perf tools: > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/ > perf/util/unwind-libdw.c?h=perf/core#n52 > > Here's the code for the perfparser: > > http://code.qt.io/cgit/qt-creator/perfparser.git/tree/app/ > perfsymboltable.cpp#n479 > > Let's concentrate on perf for now, but perfparser has similar logic: > > We parse the mmap events in the perf.data file and store that information. > Note that the perf.data file does not contain events for munmap calls. Then > while unwinding the callstack of a perf sample, we lookup the most recent > mmap event for every given instruction pointer address, and ensure that the > corresponding ELF was registered with libdw. > > > Specifically are you using false for the add_p_vaddr argument? > > Yes, we are. > > > And could you provide some example where the reported address is > > wrong/different from the start address of the Dwfl_Module? > > I don't think it's the start address that is wrong, rather it's the end > address. But it's hard for me to come up with a small selfcontained example > at this stage. I am regularly seeing broken backtraces for samples where I > have the gut feeling that missing reported ELFs are to blame. But we report > everything, except for scenarios where the mmap events seemingly overlap. > This overlapping is, as far as I can see, actually a side effect of > remapping taking place in the dynamic linker (i.e. a single dlopen/dynamic > linked library can yield multiple mmap events). One way or another, we end > up with a situation where we cannot report an ELF to dwfl due to two > issues: > > a) either ELF tells us we are overlapping some module and just stops which > is bad, since we would actually much prefer the newly reported ELF to take > precedence > > b) we find an mmap event that with a non-zero pgoff, and have no clue how to > call dwfl_report_elf and just give up. > > In both cases, I was hopeing for dwfl_report_module to help since it > seemingly allows me to exactly recreate the mapping that was traced > originally.
Here's one way to investigate where perf and dwfl disagree on the file
mappings:
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 7bdd239c795c..739c68b73772 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -48,11 +48,17 @@ static int __report_module(struct addr_location *al, u64
ip,
mod = 0;
}
- if (!mod)
+ if (!mod) {
+ Dwarf_Addr s, e;
mod = dwfl_report_elf(ui->dwfl, dso->short_name,
(dso->symsrc_filename ?
dso->symsrc_filename : dso-
>long_name), -1, al->map->start,
false);
+ dwfl_module_info(mod, NULL, &s, &e, NULL, NULL, NULL, NULL);
+ if (al->map->start != s || al->map->end != e)
+ pr_warning("MEH: %s | mmap: %zx %zx | dwfl: %zx %zx\n",
dso-
>short_name, al->map->start, al->map->end, s, e);
+ }
+
return mod && dwfl_addrmodule(ui->dwfl, ip) == mod ? 0 : -1;
}
On one of my perf.data files with many broken backtraces this gives me:
MEH: heaptrack_print | mmap: 56166e9d4000 56166ea39000 | dwfl: 56166e9d4000
56166ea38880
MEH: ld-2.26.so | mmap: 7fd0afc6c000 7fd0afe93000 | dwfl: 7fd0afc6c000
7fd0afe920f8
MEH: libc-2.26.so | mmap: 7fd0ae16a000 7fd0ae521000 | dwfl: 7fd0ae16a000
7fd0ae5208f0
MEH: libstdc++.so.6.0.24 | mmap: 7fd0aea84000 7fd0aee0b000 | dwfl:
7fd0aea84000 7fd0aee0a640
MEH: libzstd.so.1.3.3 | mmap: 7fd0aee0b000 7fd0af087000 | dwfl: 7fd0aee0b000
7fd0af086030
Interestingly, here the mmap events observed by perf are actually always
*larger* than what dwfl sees...
--
Milian Wolff
[email protected]
http://milianw.de
signature.asc
Description: This is a digitally signed message part.
