Re: [PATCH] perf script: Fix LBR skid dump problems in brstackinsn
On Donnerstag, 6. Dezember 2018 23:52:07 CET Andi Kleen wrote: > On Thu, Dec 06, 2018 at 06:29:20PM -0300, Arnaldo Carvalho de Melo wrote: > > Em Thu, Dec 06, 2018 at 12:51:48PM -0800, Andi Kleen escreveu: > > > On Thu, Dec 06, 2018 at 02:01:40PM -0300, Arnaldo Carvalho de Melo wrote: > > > > Em Mon, Nov 19, 2018 at 09:06:17PM -0800, Andi Kleen escreveu: > > > > > From: Andi Kleen > > > > > > > > > > This is a fix for another instance of the skid problem Milian > > > > > recently found [1] > > > > I think you forgot to add the reference, i.e. what is the url or > > message-id that this [1] refers to? > > Hmm, I thought I saw some patches from Milian for this earlier, > but now I can't find them. Perhaps I misremember. Milian > can point to them if they exist and are not just a figment > of my imagination :-) I only have very early POC patches, cf.: https://lkml.org/lkml/2018/11/14/608 I've now also pushed that on my WIP branch: https://github.com/milianw/linux/ tree/pebs-callchain-breakage I haven't had the time since to work on this. The patches as-is are not upstreamable. There are some open questions on my side (see mail referenced above). > These were the changes to report the stack frame RIP/RSP in the PEBS > handler and use it for unwinding in perf. Yes, I was looking at something different. I've no experience with brstackinsn usage in perf, so I can't really add my tested-by. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[tip:perf/core] perf script: Share code and output format for uregs and iregs output
Commit-ID: 9add8fe8e6f63db47e40e65173530dcb68cd7a07 Gitweb: https://git.kernel.org/tip/9add8fe8e6f63db47e40e65173530dcb68cd7a07 Author: Milian Wolff AuthorDate: Wed, 7 Nov 2018 23:34:37 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 21 Nov 2018 12:00:32 -0300 perf script: Share code and output format for uregs and iregs output The iregs output was missing the newline at end as well as the leading ABI output. This made it hard to compare the iregs and uregs values. Instead, use a single function to output the register values and use it for both, iregs and uregs, to ensure the output is consistent. Before: perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffda BX:0xCX:0x7f84ad85798bDX:0x560209699d50 SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0 SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b R8:0x7ffe2c7a2030R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... After: perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffdaBX:0x CX:0x7f84ad85798bDX:0x560209699d50SI:0x7ffe2c7a6820 DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058 IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2bR8:0x7ffe2c7a2030 R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Link: http://lkml.kernel.org/r/20181107223437.9071-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 40 +--- 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index daf73832743e
[tip:perf/core] perf script: Share code and output format for uregs and iregs output
Commit-ID: 9add8fe8e6f63db47e40e65173530dcb68cd7a07 Gitweb: https://git.kernel.org/tip/9add8fe8e6f63db47e40e65173530dcb68cd7a07 Author: Milian Wolff AuthorDate: Wed, 7 Nov 2018 23:34:37 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 21 Nov 2018 12:00:32 -0300 perf script: Share code and output format for uregs and iregs output The iregs output was missing the newline at end as well as the leading ABI output. This made it hard to compare the iregs and uregs values. Instead, use a single function to output the register values and use it for both, iregs and uregs, to ensure the output is consistent. Before: perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffda BX:0xCX:0x7f84ad85798bDX:0x560209699d50 SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0 SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b R8:0x7ffe2c7a2030R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... After: perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffdaBX:0x CX:0x7f84ad85798bDX:0x560209699d50SI:0x7ffe2c7a6820 DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058 IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2bR8:0x7ffe2c7a2030 R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Link: http://lkml.kernel.org/r/20181107223437.9071-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 40 +--- 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index daf73832743e
[tip:perf/core] perf script: Add newline after uregs output
Commit-ID: b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14 Gitweb: https://git.kernel.org/tip/b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14 Author: Milian Wolff AuthorDate: Wed, 7 Nov 2018 10:37:05 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 21 Nov 2018 12:00:31 -0300 perf script: Add newline after uregs output This change makes it much easier to easily distinguish between consecutive samples by keeping the empty line between them, like we see when we do not enable uregs output. Before: cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... After: cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... Signed-off-by: Milian Wolff Cc: Jiri Olsa Link: http://lkml.kernel.org/r/20181107093705.16346-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index b5bc85bd0bbe..daf73832743e 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample *sample, printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), val); } + fprintf(fp, "\n"); + return printed; }
[tip:perf/core] perf script: Add newline after uregs output
Commit-ID: b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14 Gitweb: https://git.kernel.org/tip/b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14 Author: Milian Wolff AuthorDate: Wed, 7 Nov 2018 10:37:05 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 21 Nov 2018 12:00:31 -0300 perf script: Add newline after uregs output This change makes it much easier to easily distinguish between consecutive samples by keeping the empty line between them, like we see when we do not enable uregs output. Before: cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... After: cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... Signed-off-by: Milian Wolff Cc: Jiri Olsa Link: http://lkml.kernel.org/r/20181107093705.16346-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index b5bc85bd0bbe..daf73832743e 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample *sample, printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), val); } + fprintf(fp, "\n"); + return printed; }
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Donnerstag, 15. November 2018 03:05:32 CET Travis Downs wrote: > On Wed, Nov 14, 2018 at 8:20 AM Milian Wolff wrote: > > 3) I suggest we always keep the first frame and sample IP from the user > > regs, i.e. the accurate PEBS/IBS IP. Then we add the following frames > > from unwinding the ustack with the iregs. > > Does this mean that the displayed unwind will sometimes be > "impossible" to have actually be generated from a consistent execution > of the user program? Yes, that is exactly what I'm saying. > For example, the top frame (from PEBS) and second frame (from iregs) > may be inconsistent in that the latter function never calls the first. > At this point it would be good to have an indication at the top frame > is from a different source than the rest of the frames, lest the user > pull is hair out trying to determine how function X seems to call > function Y despite that not being the case in the source. I agree. I personally like your suggested approach - only add an indication when the IP differs so much that it points to a different function. What do others say to this? Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Donnerstag, 15. November 2018 03:05:32 CET Travis Downs wrote: > On Wed, Nov 14, 2018 at 8:20 AM Milian Wolff wrote: > > 3) I suggest we always keep the first frame and sample IP from the user > > regs, i.e. the accurate PEBS/IBS IP. Then we add the following frames > > from unwinding the ustack with the iregs. > > Does this mean that the displayed unwind will sometimes be > "impossible" to have actually be generated from a consistent execution > of the user program? Yes, that is exactly what I'm saying. > For example, the top frame (from PEBS) and second frame (from iregs) > may be inconsistent in that the latter function never calls the first. > At this point it would be good to have an indication at the top frame > is from a different source than the rest of the frames, lest the user > pull is hair out trying to determine how function X seems to call > function Y despite that not being the case in the source. I agree. I personally like your suggested approach - only add an indication when the IP differs so much that it points to a different function. What do others say to this? Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
[-01]57.870061: 701199 cycles:pppu: 7fc1042797b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so) 7fc1042797b5 __hypot_finite+0x155 (/usr/lib/libm-2.28.so) 7fc10425faf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) (unwind ip differs) 5622c7452128 main+0x88 (/tmp/cpp-inlining) 7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) 5622c74521ed _start+0x2d (/tmp/cpp-inlining) ``` But always skipping the IP is also sometimes wrong, like in this case: ``` cpp-inlining 2605 [-01]57.862313: 694984 cycles:pppu: 7fc1042797b9 __hypot_finite+0x159 (/usr/lib/libm-2.28.so) 5622c7452128 main+0x88 (/tmp/cpp-inlining) 7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) 5622c74521ed _start+0x2d (/tmp/cpp-inlining) ``` Here, we are missing the hypotf32x call inbetween __hypot_finite and main. Do we want to introduce some heuristic on how handle these scenarios? I.e. if uregs->ip and iregs->ip point to the same function symbol, then skip the frame for iregs->ip, otherwise add it? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts>From 422d2a95eff344407ec425f0de55b264841d1757 Mon Sep 17 00:00:00 2001 From: Milian Wolff Date: Wed, 14 Nov 2018 14:10:47 +0100 Subject: [PATCH 1/2] [WIP] perf: make it possible to collect both, iregs and uregs Previously, only one set of registers was stored in the perf data for both, user and interrupt registers. Now, two distinct sets can be sampled. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa --- arch/x86/events/amd/ibs.c| 2 +- arch/x86/events/core.c | 2 +- arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 7 +++ arch/x86/events/intel/knc.c | 2 +- arch/x86/events/intel/p4.c | 2 +- arch/x86/kernel/ptrace.c | 2 +- arch/x86/kvm/pmu.c | 4 ++-- drivers/oprofile/nmi_timer_int.c | 2 +- include/linux/perf_event.h | 18 +++-- kernel/events/core.c | 34 kernel/trace/bpf_trace.c | 2 +- kernel/watchdog_hld.c| 2 +- 13 files changed, 43 insertions(+), 38 deletions(-) diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index d50bb4dc0650..567db8878511 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -670,7 +670,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs) data.raw = } - throttle = perf_event_overflow(event, , ); + throttle = perf_event_overflow(event, , , iregs); out: if (throttle) perf_ibs_stop(event, 0); diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 106911b603bd..acdcafa57ca0 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1493,7 +1493,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs) if (!x86_perf_event_set_period(event)) continue; - if (perf_event_overflow(event, , regs)) + if (perf_event_overflow(event, , regs, regs)) x86_pmu_stop(event, 0); } diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 273c62e81546..2156620b3d9e 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2299,7 +2299,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) if (has_branch_stack(event)) data.br_stack = >lbr_stack; - if (perf_event_overflow(event, , regs)) + if (perf_event_overflow(event, , regs, regs)) x86_pmu_stop(event, 0); } diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index b7b01d762d32..018fc0649033 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -639,7 +639,7 @@ int intel_pmu_drain_bts_buffer(void) * the sample. */ rcu_read_lock(); - perf_prepare_sample(, , event, ); + perf_prepare_sample(, , event, , ); if (perf_output_begin(, event, header.size * (top - base - skip))) @@ -1273,7 +1273,6 @@ static void setup_pebs_sample_data(struct perf_event *event, set_linear_ip(regs, pebs->ip); } - if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) && x86_pmu.intel_cap.pebs_format >= 1) data->addr = pebs->dla; @@ -1430,7 +1429,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event, while (count > 1) { setup_pebs_sample_data(event, iregs, at, , ); - perf_event_output(event, , ); + perf_event_output(event, , , iregs); at += x86_pmu.pebs_record_size; at = get_next_pebs_record_by_bit(at, top, bit); count--; @@ -1442,7 +1441,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event, * All but the last records are processed. * The last one is left to be able to call the overflow handler. */ - if (perf_event_overfl
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
[-01]57.870061: 701199 cycles:pppu: 7fc1042797b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so) 7fc1042797b5 __hypot_finite+0x155 (/usr/lib/libm-2.28.so) 7fc10425faf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) (unwind ip differs) 5622c7452128 main+0x88 (/tmp/cpp-inlining) 7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) 5622c74521ed _start+0x2d (/tmp/cpp-inlining) ``` But always skipping the IP is also sometimes wrong, like in this case: ``` cpp-inlining 2605 [-01]57.862313: 694984 cycles:pppu: 7fc1042797b9 __hypot_finite+0x159 (/usr/lib/libm-2.28.so) 5622c7452128 main+0x88 (/tmp/cpp-inlining) 7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) 5622c74521ed _start+0x2d (/tmp/cpp-inlining) ``` Here, we are missing the hypotf32x call inbetween __hypot_finite and main. Do we want to introduce some heuristic on how handle these scenarios? I.e. if uregs->ip and iregs->ip point to the same function symbol, then skip the frame for iregs->ip, otherwise add it? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts>From 422d2a95eff344407ec425f0de55b264841d1757 Mon Sep 17 00:00:00 2001 From: Milian Wolff Date: Wed, 14 Nov 2018 14:10:47 +0100 Subject: [PATCH 1/2] [WIP] perf: make it possible to collect both, iregs and uregs Previously, only one set of registers was stored in the perf data for both, user and interrupt registers. Now, two distinct sets can be sampled. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Cc: Andi Kleen Cc: Jiri Olsa --- arch/x86/events/amd/ibs.c| 2 +- arch/x86/events/core.c | 2 +- arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 7 +++ arch/x86/events/intel/knc.c | 2 +- arch/x86/events/intel/p4.c | 2 +- arch/x86/kernel/ptrace.c | 2 +- arch/x86/kvm/pmu.c | 4 ++-- drivers/oprofile/nmi_timer_int.c | 2 +- include/linux/perf_event.h | 18 +++-- kernel/events/core.c | 34 kernel/trace/bpf_trace.c | 2 +- kernel/watchdog_hld.c| 2 +- 13 files changed, 43 insertions(+), 38 deletions(-) diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index d50bb4dc0650..567db8878511 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -670,7 +670,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs) data.raw = } - throttle = perf_event_overflow(event, , ); + throttle = perf_event_overflow(event, , , iregs); out: if (throttle) perf_ibs_stop(event, 0); diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 106911b603bd..acdcafa57ca0 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1493,7 +1493,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs) if (!x86_perf_event_set_period(event)) continue; - if (perf_event_overflow(event, , regs)) + if (perf_event_overflow(event, , regs, regs)) x86_pmu_stop(event, 0); } diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 273c62e81546..2156620b3d9e 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2299,7 +2299,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) if (has_branch_stack(event)) data.br_stack = >lbr_stack; - if (perf_event_overflow(event, , regs)) + if (perf_event_overflow(event, , regs, regs)) x86_pmu_stop(event, 0); } diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index b7b01d762d32..018fc0649033 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -639,7 +639,7 @@ int intel_pmu_drain_bts_buffer(void) * the sample. */ rcu_read_lock(); - perf_prepare_sample(, , event, ); + perf_prepare_sample(, , event, , ); if (perf_output_begin(, event, header.size * (top - base - skip))) @@ -1273,7 +1273,6 @@ static void setup_pebs_sample_data(struct perf_event *event, set_linear_ip(regs, pebs->ip); } - if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) && x86_pmu.intel_cap.pebs_format >= 1) data->addr = pebs->dla; @@ -1430,7 +1429,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event, while (count > 1) { setup_pebs_sample_data(event, iregs, at, , ); - perf_event_output(event, , ); + perf_event_output(event, , , iregs); at += x86_pmu.pebs_record_size; at = get_next_pebs_record_by_bit(at, top, bit); count--; @@ -1442,7 +1441,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event, * All but the last records are processed. * The last one is left to be able to call the overflow handler. */ - if (perf_event_overfl
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Mittwoch, 7. November 2018 23:41:31 CET Milian Wolff wrote: > On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote: > > > Where would I look for the source to change here? So far, I only > > > concentrated on the userspace side of perf in tools/perf. > > > > Kind of similar to > > > > a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting > > fdfbbd07e91f perf: Add generic transaction flags > > > > Report the original (not overwritten) regs->ip and regs->sp > > Thanks a lot Andi! With your help, I have managed to find the exact issue > for my scenario. Turns out, it really is "just" the instruction pointer > that is wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS > we correct that to IP = 7feda32ca688. The SP register value stays the same > according to my printk output. Using the original IP value, we can unwind > correctly since we point to the correct place in the .eh_frame section. The > PEBS IP points to a different position in the .eh_frame section, which is > "too early". > > That brings up some questions: > > - I noticed `perf record --intr-regs`, but the values recorded in the > perf.data file are always the same. I.e. comparing uregs and iregs, I always > see the same values printed by `perf script`. This smells like a bug to me, > but so far I haven't figured out why this happens... The reason seems to be that perf_event_output only takes one set of registers, which then gets handed down into perf_prepare_sample where it gets sampled. Thus if sample type has both PERF_SAMPLE_REGS_USER and PERF_SAMPLE_REGS_INTR set, then by design both will store the same values for user space samples. Can we change this, such that perf_event_output also takes a second set of registers (iregs) that get sampled for PERF_SAMPLE_REGS_INTR? I'm very new to real kernel development, what kind of ABI/API stability guarantees exist for something like "perf_event_output"? > - Independently, when I add a custom printk manually in `arch/x86/events/ > intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing > any differences between SP in iregs/pebs/regs. Shouldn't it also be > recorded via PEBS? Or is it just chance that I'm never seeing any > difference in setup_pebs_sample_data between iregs->sp and regs->sp? The reason here seems to be that the registers stored in "pebs" are essentially the same as iregs for the setup for `perf record --call-graph dwarf`. The difference is the availability of `pebs->real_ip` which gets used on my system to fixup the IP. SP stays untouched and is thus only truly valid for the untouched IP (which is discarded currently - see above). > - Generally, how do we want to handle this bug? If `--intr-regs` would > actually record a different IP than stored in uregs in the perf.data file, > then we could use that as a fallback for unwinding, when it fails the first > time. Or should we always unwind from that IP? How do we mark the "actual" > frame/IP then, if that differs? > > Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Mittwoch, 7. November 2018 23:41:31 CET Milian Wolff wrote: > On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote: > > > Where would I look for the source to change here? So far, I only > > > concentrated on the userspace side of perf in tools/perf. > > > > Kind of similar to > > > > a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting > > fdfbbd07e91f perf: Add generic transaction flags > > > > Report the original (not overwritten) regs->ip and regs->sp > > Thanks a lot Andi! With your help, I have managed to find the exact issue > for my scenario. Turns out, it really is "just" the instruction pointer > that is wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS > we correct that to IP = 7feda32ca688. The SP register value stays the same > according to my printk output. Using the original IP value, we can unwind > correctly since we point to the correct place in the .eh_frame section. The > PEBS IP points to a different position in the .eh_frame section, which is > "too early". > > That brings up some questions: > > - I noticed `perf record --intr-regs`, but the values recorded in the > perf.data file are always the same. I.e. comparing uregs and iregs, I always > see the same values printed by `perf script`. This smells like a bug to me, > but so far I haven't figured out why this happens... The reason seems to be that perf_event_output only takes one set of registers, which then gets handed down into perf_prepare_sample where it gets sampled. Thus if sample type has both PERF_SAMPLE_REGS_USER and PERF_SAMPLE_REGS_INTR set, then by design both will store the same values for user space samples. Can we change this, such that perf_event_output also takes a second set of registers (iregs) that get sampled for PERF_SAMPLE_REGS_INTR? I'm very new to real kernel development, what kind of ABI/API stability guarantees exist for something like "perf_event_output"? > - Independently, when I add a custom printk manually in `arch/x86/events/ > intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing > any differences between SP in iregs/pebs/regs. Shouldn't it also be > recorded via PEBS? Or is it just chance that I'm never seeing any > difference in setup_pebs_sample_data between iregs->sp and regs->sp? The reason here seems to be that the registers stored in "pebs" are essentially the same as iregs for the setup for `perf record --call-graph dwarf`. The difference is the availability of `pebs->real_ip` which gets used on my system to fixup the IP. SP stays untouched and is thus only truly valid for the untouched IP (which is discarded currently - see above). > - Generally, how do we want to handle this bug? If `--intr-regs` would > actually record a different IP than stored in uregs in the perf.data file, > then we could use that as a fallback for unwinding, when it fails the first > time. Or should we always unwind from that IP? How do we mark the "actual" > frame/IP then, if that differs? > > Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote: > > Where would I look for the source to change here? So far, I only > > concentrated on the userspace side of perf in tools/perf. > > Kind of similar to > > a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting > fdfbbd07e91f perf: Add generic transaction flags > > Report the original (not overwritten) regs->ip and regs->sp Thanks a lot Andi! With your help, I have managed to find the exact issue for my scenario. Turns out, it really is "just" the instruction pointer that is wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS we correct that to IP = 7feda32ca688. The SP register value stays the same according to my printk output. Using the original IP value, we can unwind correctly since we point to the correct place in the .eh_frame section. The PEBS IP points to a different position in the .eh_frame section, which is "too early". That brings up some questions: - I noticed `perf record --intr-regs`, but the values recorded in the perf.data file are always the same. I.e. comparing uregs and iregs, I always see the same values printed by `perf script`. This smells like a bug to me, but so far I haven't figured out why this happens... - Independently, when I add a custom printk manually in `arch/x86/events/ intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing any differences between SP in iregs/pebs/regs. Shouldn't it also be recorded via PEBS? Or is it just chance that I'm never seeing any difference in setup_pebs_sample_data between iregs->sp and regs->sp? - Generally, how do we want to handle this bug? If `--intr-regs` would actually record a different IP than stored in uregs in the perf.data file, then we could use that as a fallback for unwinding, when it fails the first time. Or should we always unwind from that IP? How do we mark the "actual" frame/IP then, if that differs? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote: > > Where would I look for the source to change here? So far, I only > > concentrated on the userspace side of perf in tools/perf. > > Kind of similar to > > a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting > fdfbbd07e91f perf: Add generic transaction flags > > Report the original (not overwritten) regs->ip and regs->sp Thanks a lot Andi! With your help, I have managed to find the exact issue for my scenario. Turns out, it really is "just" the instruction pointer that is wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS we correct that to IP = 7feda32ca688. The SP register value stays the same according to my printk output. Using the original IP value, we can unwind correctly since we point to the correct place in the .eh_frame section. The PEBS IP points to a different position in the .eh_frame section, which is "too early". That brings up some questions: - I noticed `perf record --intr-regs`, but the values recorded in the perf.data file are always the same. I.e. comparing uregs and iregs, I always see the same values printed by `perf script`. This smells like a bug to me, but so far I haven't figured out why this happens... - Independently, when I add a custom printk manually in `arch/x86/events/ intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing any differences between SP in iregs/pebs/regs. Shouldn't it also be recorded via PEBS? Or is it just chance that I'm never seeing any difference in setup_pebs_sample_data between iregs->sp and regs->sp? - Generally, how do we want to handle this bug? If `--intr-regs` would actually record a different IP than stored in uregs in the perf.data file, then we could use that as a fallback for unwinding, when it fails the first time. Or should we always unwind from that IP? How do we mark the "actual" frame/IP then, if that differs? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[PATCH] perf script: share code and output format for uregs and iregs output
The iregs output was missing the newline at end as well as the leading ABI output. This made it hard to compare the iregs and uregs values. Instead, use a single function to output the register values and use it for both, iregs and uregs, to ensure the output is consistent. Before: ``` perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffda BX:0xCX:0x7f84ad85798bDX:0x560209699d50 SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0 SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b R8:0x7ffe2c7a2030R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... ``` After: ``` perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffdaBX:0xCX:0x7f84ad85798b DX:0x560209699d50SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9b BP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206 CS:0x33SS:0x2bR8:0x7ffe2c7a2030R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... ``` Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 40 - 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index daf73832743e..04913136bac9 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -566,30 +566,10 @@ static int perf_session__check_output_opt(struct perf_session *session) return 0; } -static int perf_sample__fprintf_iregs(struct perf_sample *sample, - struct perf_event_attr *attr, FILE *fp) -{ - struct regs_dump *regs = >intr_regs; - uint64_t mask = attr->sample_regs_intr; - unsi
[PATCH] perf script: share code and output format for uregs and iregs output
The iregs output was missing the newline at end as well as the leading ABI output. This made it hard to compare the iregs and uregs values. Instead, use a single function to output the register values and use it for both, iregs and uregs, to ensure the output is consistent. Before: ``` perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffda BX:0xCX:0x7f84ad85798bDX:0x560209699d50 SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0 SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b R8:0x7ffe2c7a2030R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... ``` After: ``` perf 7049 [-01] 1343.354347: 1 cycles:ppp: a7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) a840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux) ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286 BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440 R10:0x33816fb3b8c R11:0x1 R12:0x95bc8213a460 R13:0x95bc8213a400 R14:0x95bc8213a400 R15:0x1 ABI:2AX:0xffdaBX:0xCX:0x7f84ad85798b DX:0x560209699d50SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9b BP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206 CS:0x33SS:0x2bR8:0x7ffe2c7a2030R9:0x7f84ae55f010 R10:0x8 R11:0x206 R12:0x R13:0x R14:0x R15:0x perf 7049 [-01] 1343.354363: 1 cycles:ppp: ... ``` Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 40 - 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index daf73832743e..04913136bac9 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -566,30 +566,10 @@ static int perf_session__check_output_opt(struct perf_session *session) return 0; } -static int perf_sample__fprintf_iregs(struct perf_sample *sample, - struct perf_event_attr *attr, FILE *fp) -{ - struct regs_dump *regs = >intr_regs; - uint64_t mask = attr->sample_regs_intr; - unsi
[PATCH] perf script: add newline after uregs output
This change makes it much easier to easily distinguish between consecutive samples by keeping the empty line between them, like we see when we do not enable uregs output. Before: ``` cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... ``` After: ``` cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... ``` Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index b5bc85bd0bbe..daf73832743e 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample *sample, printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), val); } + fprintf(fp, "\n"); + return printed; } -- 2.19.1
[PATCH] perf script: add newline after uregs output
This change makes it much easier to easily distinguish between consecutive samples by keeping the empty line between them, like we see when we do not enable uregs output. Before: ``` cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... ``` After: ``` cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp: 77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7... cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp: 77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) ... ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da... ``` Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index b5bc85bd0bbe..daf73832743e 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample *sample, printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), val); } + fprintf(fp, "\n"); + return printed; } -- 2.19.1
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Dienstag, 6. November 2018 09:39:57 CET Jiri Olsa wrote: > On Mon, Nov 05, 2018 at 04:10:37PM -0800, Andi Kleen wrote: > > > > > - PMU triggers interrupt and PEBS stores RIP etc. > > > > > - code continous to execute, possibly changing the stack > > > > > > > > I dont think the code continues to execute.. the stack is ok > > > > > > Are you sure about this? I mean, isn't that the whole reason why we need > > > PEBS? Generally, if you are sure about this, can you point me to some > > > documentation on this to allow me to understand it better? > > > > Milian is right. > > > > There is a execution window from PEBS capturing registers to actually > > triggering the PMU, and if there is stack manipulation in that window > > the PEBS state might be out of sync with the real stack. > > hum, is this about having 'large pebs' or there's this window > if there's also only single pebs record allowed? which should > be case for dwarf unwind > > > The right RIP/RSP to use for the stack unwinding is always the data > > in the PMI's exception frame on the stack. > > > > Probably would need to modify perf to report those too in addition > > to the PEBS registers. > > ok, should not be that hard Where would I look for the source to change here? So far, I only concentrated on the userspace side of perf in tools/perf. Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Dienstag, 6. November 2018 09:39:57 CET Jiri Olsa wrote: > On Mon, Nov 05, 2018 at 04:10:37PM -0800, Andi Kleen wrote: > > > > > - PMU triggers interrupt and PEBS stores RIP etc. > > > > > - code continous to execute, possibly changing the stack > > > > > > > > I dont think the code continues to execute.. the stack is ok > > > > > > Are you sure about this? I mean, isn't that the whole reason why we need > > > PEBS? Generally, if you are sure about this, can you point me to some > > > documentation on this to allow me to understand it better? > > > > Milian is right. > > > > There is a execution window from PEBS capturing registers to actually > > triggering the PMU, and if there is stack manipulation in that window > > the PEBS state might be out of sync with the real stack. > > hum, is this about having 'large pebs' or there's this window > if there's also only single pebs record allowed? which should > be case for dwarf unwind > > > The right RIP/RSP to use for the stack unwinding is always the data > > in the PMI's exception frame on the stack. > > > > Probably would need to modify perf to report those too in addition > > to the PEBS registers. > > ok, should not be that hard Where would I look for the source to change here? So far, I only concentrated on the userspace side of perf in tools/perf. Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Montag, 5. November 2018 21:51:19 CET Jiri Olsa wrote: > On Fri, Nov 02, 2018 at 06:56:50PM +0100, Milian Wolff wrote: > > SNIP > > > > > Note how precise levels 0 and 1 do not produce any samples where > > > > unwinding > > > > fails. But precise level 2 produces some, and precise level 3 > > > > increases > > > > the > > > > amount (by ca. ~2x). > > > > > > > > I can reproduce this pattern on two separate Intel CPUs and kernel > > > > versions > > > > currently: > > > > > > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH > > > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts > > > > > > > > Could someone else try this? What about AMD and IBS - is it also > > > > affected? > > > > What about newer/different Intel CPUs? > > > > > > I tried on intel and can't actualy see that.. how do the failed samples > > > look like? like is there the stack dump attached, what's in the regs? > > > > > > could you please paste the 'perf report -D' output for some of the > > > failed samples? > > > > See here for one case: https://paste.kde.org/prryvdilq > > we should really print some helpfull debug output > for this.. like to show some markers where the stack > data starts Further down below, the offset for the ustack start is given (0xe0). But yes, that would be welcome. > > What Intel CPU did you use? What microcode version? Which kernel version? > > > > Generally, isn't what I'm seeing actually a neccessary evil of the ustack > > based unwinding in perf? I mean, the general procedure is as follows if > > I'm > > not mistaken: > > > > - PMU triggers interrupt and PEBS stores RIP etc. > > - code continous to execute, possibly changing the stack > > I dont think the code continues to execute.. the stack is ok Are you sure about this? I mean, isn't that the whole reason why we need PEBS? Generally, if you are sure about this, can you point me to some documentation on this to allow me to understand it better? Also, how do you explain the scenario I am seeing, with `cycles:` and `cycles:p` not suffering from this issue, but `cycles:pp` and `cycles:ppp` leading to broken samples? It _has_ to be PEBS then, no? What else could explain this? > the problem I saw in past is that the copy from user is not > 100% and sometimes you might not get full stack data you > asked for But that would indicate missing data at the end of the ustack dump? In our case, the "problematic" data is always at the start. Also note the apparent shift in the ustack copy which - in one case - directly correlatates with the code being executed, i.e. from objdump in libm I see: 0x00029688 <+40>:sub$0x28,%rsp (https://paste.kde.org/poywa7y2z) The address of the expected parent frame is 77c7caf8 (hypotf32x+0x18). This can be found at offset 80 in the ustack dump (cf. https://paste.kde.org/ prryvdilq - ("f9 ca c7 f7 ff 7f" is found at 0x130, minus 0xe0 yields 0x50 or 80). >From the libunwind (or libdw) debug output in perf, we see that the unwinder tries to access offset 32 (cf. https://paste.kde.org/prryvdilq#line-610), which is ofset by 48 from the desired value of 80. This offset is *veroy* close to the value of 40 we see in the libm disassembly for __hypot_function ("$0x28,%rsp"). Is this really just a coincidence? > have you tried with libdw unwinder? if one of the unwinder > shows more callchains, we need to fix the other one ;-) Yes, I've looked at both unwinders. Both try to access the same values, and both break due to seemingly wrong data being read from the stack. And if you look at my other patches, you may have seen that I've regularly fixed the libdw unwinder to bring it closer to libunwind. Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Montag, 5. November 2018 21:51:19 CET Jiri Olsa wrote: > On Fri, Nov 02, 2018 at 06:56:50PM +0100, Milian Wolff wrote: > > SNIP > > > > > Note how precise levels 0 and 1 do not produce any samples where > > > > unwinding > > > > fails. But precise level 2 produces some, and precise level 3 > > > > increases > > > > the > > > > amount (by ca. ~2x). > > > > > > > > I can reproduce this pattern on two separate Intel CPUs and kernel > > > > versions > > > > currently: > > > > > > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH > > > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts > > > > > > > > Could someone else try this? What about AMD and IBS - is it also > > > > affected? > > > > What about newer/different Intel CPUs? > > > > > > I tried on intel and can't actualy see that.. how do the failed samples > > > look like? like is there the stack dump attached, what's in the regs? > > > > > > could you please paste the 'perf report -D' output for some of the > > > failed samples? > > > > See here for one case: https://paste.kde.org/prryvdilq > > we should really print some helpfull debug output > for this.. like to show some markers where the stack > data starts Further down below, the offset for the ustack start is given (0xe0). But yes, that would be welcome. > > What Intel CPU did you use? What microcode version? Which kernel version? > > > > Generally, isn't what I'm seeing actually a neccessary evil of the ustack > > based unwinding in perf? I mean, the general procedure is as follows if > > I'm > > not mistaken: > > > > - PMU triggers interrupt and PEBS stores RIP etc. > > - code continous to execute, possibly changing the stack > > I dont think the code continues to execute.. the stack is ok Are you sure about this? I mean, isn't that the whole reason why we need PEBS? Generally, if you are sure about this, can you point me to some documentation on this to allow me to understand it better? Also, how do you explain the scenario I am seeing, with `cycles:` and `cycles:p` not suffering from this issue, but `cycles:pp` and `cycles:ppp` leading to broken samples? It _has_ to be PEBS then, no? What else could explain this? > the problem I saw in past is that the copy from user is not > 100% and sometimes you might not get full stack data you > asked for But that would indicate missing data at the end of the ustack dump? In our case, the "problematic" data is always at the start. Also note the apparent shift in the ustack copy which - in one case - directly correlatates with the code being executed, i.e. from objdump in libm I see: 0x00029688 <+40>:sub$0x28,%rsp (https://paste.kde.org/poywa7y2z) The address of the expected parent frame is 77c7caf8 (hypotf32x+0x18). This can be found at offset 80 in the ustack dump (cf. https://paste.kde.org/ prryvdilq - ("f9 ca c7 f7 ff 7f" is found at 0x130, minus 0xe0 yields 0x50 or 80). >From the libunwind (or libdw) debug output in perf, we see that the unwinder tries to access offset 32 (cf. https://paste.kde.org/prryvdilq#line-610), which is ofset by 48 from the desired value of 80. This offset is *veroy* close to the value of 40 we see in the libm disassembly for __hypot_function ("$0x28,%rsp"). Is this really just a coincidence? > have you tried with libdw unwinder? if one of the unwinder > shows more callchains, we need to fix the other one ;-) Yes, I've looked at both unwinders. Both try to access the same values, and both break due to seemingly wrong data being read from the stack. And if you look at my other patches, you may have seen that I've regularly fixed the libdw unwinder to bring it closer to libunwind. Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Freitag, 2. November 2018 12:26:35 CET Jiri Olsa wrote: > On Thu, Nov 01, 2018 at 11:08:18PM +0100, Milian Wolff wrote: > > On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote: > > > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote: > > > > > Can someone at least confirm whether unwinding from a function > > > > > prologue > > > > > via > > > > > .eh_frame (but without .debug_frame) should actually be possible? > > > > > > > > Yes it should be possible. Asynchronous unwind tables should work > > > > from any instruction. > > > > > > > > > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 > > > da > > > 5b 34 91 7f"). Using that address makes unwinding work for this sample. > > > What could be the reason for this shift? > > > > I believe I have found the culprit: PEBS seems to be at fault here - i.e. > > the RIP/RSP and the ustack dump of the sample simply don't fit together. > > > > Check this out: > > > > ``` > > $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf > > ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf > > ./ > > cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf > > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 37 > > 39 > > 35 > > 28 > > 40 > > 39 > > 29 > > 37 > > 31 > > 26 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf > > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 79 > > 70 > > 76 > > 77 > > 70 > > 90 > > 64 > > 78 > > 86 > > 74 > > ``` > > > > Note how precise levels 0 and 1 do not produce any samples where unwinding > > fails. But precise level 2 produces some, and precise level 3 increases > > the > > amount (by ca. ~2x). > > > > I can reproduce this pattern on two separate Intel CPUs and kernel > > versions > > currently: > > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts > > > > Could someone else try this? What about AMD and IBS - is it also affected? > > What about newer/different Intel CPUs? > > I tried on intel and can't actualy see that.. how do the failed samples > look like? like is there the stack dump attached, what's in the regs? > > could you please paste the 'perf report -D' output for some of the > failed samples? See here for one case: https://paste.kde.org/prryvdilq What Intel CPU did you use? What microcode version? Which kernel version? Generally, isn't what I'm seeing actually a neccessary evil of the ustack based unwinding in perf? I mean, the general procedure is as follows if I'm not mistaken: - PMU triggers interrupt and PEBS stores RIP etc. - code continous to execute, possibly changing the stack - PMU interrupt is handled, and a perf sample is generated - register values are used from "past" status stored in PEBS - but ustack dump is based on the "current" status >From this latter discrepancy, it must directly follow that *sometimes* the ustack dump represents a state that cannot be unwound from, no? Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Freitag, 2. November 2018 12:26:35 CET Jiri Olsa wrote: > On Thu, Nov 01, 2018 at 11:08:18PM +0100, Milian Wolff wrote: > > On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote: > > > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote: > > > > > Can someone at least confirm whether unwinding from a function > > > > > prologue > > > > > via > > > > > .eh_frame (but without .debug_frame) should actually be possible? > > > > > > > > Yes it should be possible. Asynchronous unwind tables should work > > > > from any instruction. > > > > > > > > > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 > > > da > > > 5b 34 91 7f"). Using that address makes unwinding work for this sample. > > > What could be the reason for this shift? > > > > I believe I have found the culprit: PEBS seems to be at fault here - i.e. > > the RIP/RSP and the ustack dump of the sample simply don't fit together. > > > > Check this out: > > > > ``` > > $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf > > ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf > > ./ > > cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf > > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 37 > > 39 > > 35 > > 28 > > 40 > > 39 > > 29 > > 37 > > 31 > > 26 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf > > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 79 > > 70 > > 76 > > 77 > > 70 > > 90 > > 64 > > 78 > > 86 > > 74 > > ``` > > > > Note how precise levels 0 and 1 do not produce any samples where unwinding > > fails. But precise level 2 produces some, and precise level 3 increases > > the > > amount (by ca. ~2x). > > > > I can reproduce this pattern on two separate Intel CPUs and kernel > > versions > > currently: > > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts > > > > Could someone else try this? What about AMD and IBS - is it also affected? > > What about newer/different Intel CPUs? > > I tried on intel and can't actualy see that.. how do the failed samples > look like? like is there the stack dump attached, what's in the regs? > > could you please paste the 'perf report -D' output for some of the > failed samples? See here for one case: https://paste.kde.org/prryvdilq What Intel CPU did you use? What microcode version? Which kernel version? Generally, isn't what I'm seeing actually a neccessary evil of the ustack based unwinding in perf? I mean, the general procedure is as follows if I'm not mistaken: - PMU triggers interrupt and PEBS stores RIP etc. - code continous to execute, possibly changing the stack - PMU interrupt is handled, and a perf sample is generated - register values are used from "past" status stored in PEBS - but ustack dump is based on the "current" status >From this latter discrepancy, it must directly follow that *sometimes* the ustack dump represents a state that cannot be unwound from, no? Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote: > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote: > > > Can someone at least confirm whether unwinding from a function prologue > > > via > > > .eh_frame (but without .debug_frame) should actually be possible? > > > > Yes it should be possible. Asynchronous unwind tables should work > > from any instruction. > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da > 5b 34 91 7f"). Using that address makes unwinding work for this sample. > What could be the reason for this shift? I believe I have found the culprit: PEBS seems to be at fault here - i.e. the RIP/RSP and the ustack dump of the sample simply don't fit together. Check this out: ``` $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 0 0 0 0 0 0 0 0 0 0 $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 0 0 0 0 0 0 0 0 0 0 $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 37 39 35 28 40 39 29 37 31 26 $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 79 70 76 77 70 90 64 78 86 74 ``` Note how precise levels 0 and 1 do not produce any samples where unwinding fails. But precise level 2 produces some, and precise level 3 increases the amount (by ca. ~2x). I can reproduce this pattern on two separate Intel CPUs and kernel versions currently: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts Could someone else try this? What about AMD and IBS - is it also affected? What about newer/different Intel CPUs? Better yet, can someone come up with a fix for this on Intel with maximum precise level? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote: > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote: > > > Can someone at least confirm whether unwinding from a function prologue > > > via > > > .eh_frame (but without .debug_frame) should actually be possible? > > > > Yes it should be possible. Asynchronous unwind tables should work > > from any instruction. > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da > 5b 34 91 7f"). Using that address makes unwinding work for this sample. > What could be the reason for this shift? I believe I have found the culprit: PEBS seems to be at fault here - i.e. the RIP/RSP and the ustack dump of the sample simply don't fit together. Check this out: ``` $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 0 0 0 0 0 0 0 0 0 0 $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 0 0 0 0 0 0 0 0 0 0 $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 37 39 35 28 40 39 29 37 31 26 $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 79 70 76 77 70 90 64 78 86 74 ``` Note how precise levels 0 and 1 do not produce any samples where unwinding fails. But precise level 2 produces some, and precise level 3 increases the amount (by ca. ~2x). I can reproduce this pattern on two separate Intel CPUs and kernel versions currently: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts Could someone else try this? What about AMD and IBS - is it also affected? What about newer/different Intel CPUs? Better yet, can someone come up with a fix for this on Intel with maximum precise level? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[tip:perf/urgent] perf unwind: Take pgoff into account when reporting elf to libdwfl
Commit-ID: 1fe627da30331024f453faef04d500079b901107 Gitweb: https://git.kernel.org/tip/1fe627da30331024f453faef04d500079b901107 Author: Milian Wolff AuthorDate: Mon, 29 Oct 2018 15:16:44 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 31 Oct 2018 09:57:50 -0300 perf unwind: Take pgoff into account when reporting elf to libdwfl libdwfl parses an ELF file itself and creates mappings for the individual sections. perf on the other hand sees raw mmap events which represent individual sections. When we encounter an address pointing into a mapping with pgoff != 0, we must take that into account and report the file at the non-offset base address. This fixes unwinding with libdwfl in some cases. E.g. for a file like: ``` using namespace std; mutex g_mutex; double worker() { lock_guard guard(g_mutex); uniform_real_distribution uniform(-1E5, 1E5); default_random_engine engine; double s = 0; for (int i = 0; i < 1000; ++i) { s += norm(complex(uniform(engine), uniform(engine))); } cout << s << endl; return s; } int main() { vector> results; for (int i = 0; i < 1; ++i) { results.push_back(async(launch::async, worker)); } return 0; } ``` Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp -o cpp-locking`, then record it with `perf record --call-graph dwarf -e sched:sched_switch`. When you analyze it with `perf script` and libunwind, you should see: ``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) b16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) b1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) b106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) b18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) 7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so) 7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so) 7f38e42569e5 __GI___libc_malloc+0x115 (inlined) 7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined) 7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined) 7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined) 7f38e424df36 _IO_new_file_xsputn+0x116 (inlined) 7f38e4242bfb __GI__IO_fwrite+0xdb (inlined) 7f38e463fa6d std::basic_streambuf >::sputn(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator >::_M_put(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::__write(std::ostreambuf_iterator >, char const*, int)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::num_put > >::_M_insert_float(std::ostreambuf_iterator 7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl> 7f38e464bd70 std::ostream& std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25) 563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined) 563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking) 563b9cb506fb double std::__invoke_impl(std::__invoke_other, double (*&&)())+0x2b (inlined) 563b9cb506fb std::__invoke_result::type std::__invoke(double (*&&)())+0x2b (inlined) 563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined) 563b9cb506fb std::thread::_Invoker >::operator()()+0x2b (inlined) 563b9cb506fb std::__future_base::_Task_setter, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker >, dou> 563b9cb506fb std::_Function_handler (), std::__future_base::_Task_setter 563b9cb507e8 std::function ()>::operator()() const+0x28 (inlined) 563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*)+0x28 (/ssd/milian/> 7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so) 563b9cb51149 __gthread_once+0xe9 (inlined) 563b9cb51149 void std::call_once ()>*, bool*)>
[tip:perf/urgent] perf unwind: Take pgoff into account when reporting elf to libdwfl
Commit-ID: 1fe627da30331024f453faef04d500079b901107 Gitweb: https://git.kernel.org/tip/1fe627da30331024f453faef04d500079b901107 Author: Milian Wolff AuthorDate: Mon, 29 Oct 2018 15:16:44 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 31 Oct 2018 09:57:50 -0300 perf unwind: Take pgoff into account when reporting elf to libdwfl libdwfl parses an ELF file itself and creates mappings for the individual sections. perf on the other hand sees raw mmap events which represent individual sections. When we encounter an address pointing into a mapping with pgoff != 0, we must take that into account and report the file at the non-offset base address. This fixes unwinding with libdwfl in some cases. E.g. for a file like: ``` using namespace std; mutex g_mutex; double worker() { lock_guard guard(g_mutex); uniform_real_distribution uniform(-1E5, 1E5); default_random_engine engine; double s = 0; for (int i = 0; i < 1000; ++i) { s += norm(complex(uniform(engine), uniform(engine))); } cout << s << endl; return s; } int main() { vector> results; for (int i = 0; i < 1; ++i) { results.push_back(async(launch::async, worker)); } return 0; } ``` Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp -o cpp-locking`, then record it with `perf record --call-graph dwarf -e sched:sched_switch`. When you analyze it with `perf script` and libunwind, you should see: ``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) b16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) b1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) b106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) b18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) 7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so) 7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so) 7f38e42569e5 __GI___libc_malloc+0x115 (inlined) 7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined) 7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined) 7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined) 7f38e424df36 _IO_new_file_xsputn+0x116 (inlined) 7f38e4242bfb __GI__IO_fwrite+0xdb (inlined) 7f38e463fa6d std::basic_streambuf >::sputn(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator >::_M_put(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::__write(std::ostreambuf_iterator >, char const*, int)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::num_put > >::_M_insert_float(std::ostreambuf_iterator 7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl> 7f38e464bd70 std::ostream& std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25) 563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined) 563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking) 563b9cb506fb double std::__invoke_impl(std::__invoke_other, double (*&&)())+0x2b (inlined) 563b9cb506fb std::__invoke_result::type std::__invoke(double (*&&)())+0x2b (inlined) 563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined) 563b9cb506fb std::thread::_Invoker >::operator()()+0x2b (inlined) 563b9cb506fb std::__future_base::_Task_setter, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker >, dou> 563b9cb506fb std::_Function_handler (), std::__future_base::_Task_setter 563b9cb507e8 std::function ()>::operator()() const+0x28 (inlined) 563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*)+0x28 (/ssd/milian/> 7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so) 563b9cb51149 __gthread_once+0xe9 (inlined) 563b9cb51149 void std::call_once ()>*, bool*)>
Re: Broken dwarf unwinding - wrong stack pointer register value?
_step: dwarf_step returned 1 >_Ux86_64_step: returning 1 >_Ux86_64_step: (cursor=0x7fffafa55c10, ip=0xc0d885722245b5e4, cfa=0x7ffd1e276f38) >_Ux86_64_step: dwarf_step returned -22 >_Ux86_64_step: returning -22 unwind: __hypot_finite:ip = 0x7f91345d77b4 (0x297b4) unwind: '':ip = 0xc0d885722245b5e3 (0x0) 7f91345d77b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so) c0d885722245b5e3 [unknown] ([unknown]) ``` Now, I also tried the following: ``` $ perf probe -x /usr/lib/libm-2.28.so -a __hypot_finite+0x154 $ perf record -F 1000 --call-graph dwarf -e probe_libm:__hypot_finite ./cpp- inlining ``` And all of the samples unwind correctly! This makes me believe that it's not the .eh_frame information which is wrong - otherwise unwinding would always fail from these locations, esp. when using the custom probe trace point. But since this is not happening, what else could it be? I only see two possibilities: the register values or the stack memory stored in in the sample by perf. The register values is unlikely, since I now understand how the .eh_frame contents get analyzed. For __hypot_finite+0x154, we will always end up asking for the address at SP+24. access_mem thus will always look at the address at offset 24, independent of the actual value of SP. So, what remains is that the stack dump is somehow wrong, i.e. its contents are moved by some offset. Note how I can "fix" the unwinding for such broken samples by manually applying some offset in access_mem. By looking at other samples where unwinding works from __hypot_finite, I could figure out that the correct address to be read for unwnding should be 7f91345bdaf8, e.g.: ``` 7f91345d76ed __hypot_finite+0x8d (/usr/lib/libm-2.28.so) 7f91345bdaf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) 5620579cb128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ build/tests/test-clients/cpp-inlining/cpp-inlining) ``` This address indeed occurs in the user stack dump (starting at 0xe0 in the raw event data) for the broken sample, cf.: ``` . 00e0: 00 20 00 00 00 00 00 00 c0 b1 9c 57 20 56 00 00 . .W V.. . 00f0: 70 70 27 1e fd 7f 00 00 f9 da 5b 34 91 7f 00 00 pp'...[4 . 0100: e4 b5 45 22 72 85 d8 c0 c0 1d 16 84 43 30 bb c0 ..E"r...C0.. . ``` We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da 5b 34 91 7f"). Using that address makes unwinding work for this sample. What could be the reason for this shift? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: Broken dwarf unwinding - wrong stack pointer register value?
_step: dwarf_step returned 1 >_Ux86_64_step: returning 1 >_Ux86_64_step: (cursor=0x7fffafa55c10, ip=0xc0d885722245b5e4, cfa=0x7ffd1e276f38) >_Ux86_64_step: dwarf_step returned -22 >_Ux86_64_step: returning -22 unwind: __hypot_finite:ip = 0x7f91345d77b4 (0x297b4) unwind: '':ip = 0xc0d885722245b5e3 (0x0) 7f91345d77b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so) c0d885722245b5e3 [unknown] ([unknown]) ``` Now, I also tried the following: ``` $ perf probe -x /usr/lib/libm-2.28.so -a __hypot_finite+0x154 $ perf record -F 1000 --call-graph dwarf -e probe_libm:__hypot_finite ./cpp- inlining ``` And all of the samples unwind correctly! This makes me believe that it's not the .eh_frame information which is wrong - otherwise unwinding would always fail from these locations, esp. when using the custom probe trace point. But since this is not happening, what else could it be? I only see two possibilities: the register values or the stack memory stored in in the sample by perf. The register values is unlikely, since I now understand how the .eh_frame contents get analyzed. For __hypot_finite+0x154, we will always end up asking for the address at SP+24. access_mem thus will always look at the address at offset 24, independent of the actual value of SP. So, what remains is that the stack dump is somehow wrong, i.e. its contents are moved by some offset. Note how I can "fix" the unwinding for such broken samples by manually applying some offset in access_mem. By looking at other samples where unwinding works from __hypot_finite, I could figure out that the correct address to be read for unwnding should be 7f91345bdaf8, e.g.: ``` 7f91345d76ed __hypot_finite+0x8d (/usr/lib/libm-2.28.so) 7f91345bdaf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) 5620579cb128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ build/tests/test-clients/cpp-inlining/cpp-inlining) ``` This address indeed occurs in the user stack dump (starting at 0xe0 in the raw event data) for the broken sample, cf.: ``` . 00e0: 00 20 00 00 00 00 00 00 c0 b1 9c 57 20 56 00 00 . .W V.. . 00f0: 70 70 27 1e fd 7f 00 00 f9 da 5b 34 91 7f 00 00 pp'...[4 . 0100: e4 b5 45 22 72 85 d8 c0 c0 1d 16 84 43 30 bb c0 ..E"r...C0.. . ``` We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da 5b 34 91 7f"). Using that address makes unwinding work for this sample. What could be the reason for this shift? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl
On Montag, 29. Oktober 2018 18:40:14 CET Arnaldo Carvalho de Melo wrote: > Em Mon, Oct 29, 2018 at 04:26:27PM +0100, Milian Wolff escreveu: > > On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote: > > > Libdwfl parses an ELF file itself and creates mappings for the > > > individual sections. Perf on the other hand sees raw mmap events which > > > represent individual sections. When we encounter an address pointing > > > into a mapping with pgoff != 0, we must take that into account and > > > report the file at the non-offset base address. > > > > > This fixes unwinding with libdwfl in some cases. E.g. for a file like: > > > > > > > Note that the backtrace is still stopping too early, when > > > compared to the nice results obtained via libunwind. It's > > > unclear so far what the reason for that is. > > > > The remaining issue is due to a bug in elfutils: > > > > https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html > > > > With both patches applied, libunwind and elfutils produce the same output > > for the above scenario. > > I'm updating the patch to remove: > > "It's unclear so far what the reason for that is." > > Adding: > > "See https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html for > a patch fixing that." > > Ok? Yes, thanks. I figured the fix for elfutils out after I submitted the perf patch. > Or are you saying that that "unclear" part applies to both libunwind > and elfutils? No, libunwind worked fine without these patches for this specific case. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl
On Montag, 29. Oktober 2018 18:40:14 CET Arnaldo Carvalho de Melo wrote: > Em Mon, Oct 29, 2018 at 04:26:27PM +0100, Milian Wolff escreveu: > > On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote: > > > Libdwfl parses an ELF file itself and creates mappings for the > > > individual sections. Perf on the other hand sees raw mmap events which > > > represent individual sections. When we encounter an address pointing > > > into a mapping with pgoff != 0, we must take that into account and > > > report the file at the non-offset base address. > > > > > This fixes unwinding with libdwfl in some cases. E.g. for a file like: > > > > > > > Note that the backtrace is still stopping too early, when > > > compared to the nice results obtained via libunwind. It's > > > unclear so far what the reason for that is. > > > > The remaining issue is due to a bug in elfutils: > > > > https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html > > > > With both patches applied, libunwind and elfutils produce the same output > > for the above scenario. > > I'm updating the patch to remove: > > "It's unclear so far what the reason for that is." > > Adding: > > "See https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html for > a patch fixing that." > > Ok? Yes, thanks. I figured the fix for elfutils out after I submitted the perf patch. > Or are you saying that that "unclear" part applies to both libunwind > and elfutils? No, libunwind worked fine without these patches for this specific case. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl
On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote: > Libdwfl parses an ELF file itself and creates mappings for the > individual sections. Perf on the other hand sees raw mmap events which > represent individual sections. When we encounter an address pointing > into a mapping with pgoff != 0, we must take that into account and > report the file at the non-offset base address. > > This fixes unwinding with libdwfl in some cases. E.g. for a file like: > Note that the backtrace is still stopping too early, when > compared to the nice results obtained via libunwind. It's > unclear so far what the reason for that is. The remaining issue is due to a bug in elfutils: https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html With both patches applied, libunwind and elfutils produce the same output for the above scenario. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl
On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote: > Libdwfl parses an ELF file itself and creates mappings for the > individual sections. Perf on the other hand sees raw mmap events which > represent individual sections. When we encounter an address pointing > into a mapping with pgoff != 0, we must take that into account and > report the file at the non-offset base address. > > This fixes unwinding with libdwfl in some cases. E.g. for a file like: > Note that the backtrace is still stopping too early, when > compared to the nice results obtained via libunwind. It's > unclear so far what the reason for that is. The remaining issue is due to a bug in elfutils: https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html With both patches applied, libunwind and elfutils produce the same output for the above scenario. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[PATCH] perf util: take pgoff into account when reporting elf to libdwfl
ke_impl >, double>::_Async_state_impl(std::thread::_Invoker 563b9cb51149 std::__invoke_result >, double>::_Async_state_impl(std::thread::_Invoker >> 563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker >, double>::_Async_state_> 563b9cb51149 std::thread::_Invoker >, double>::_Async_state_impl(std::thread::_Invoker 563b9cb51149 std::thread::_State_impl >, double>::_Async_state_impl(std::thread> 7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25) 7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so) 7f38e42ccb22 __GI___clone+0x42 (inlined) ``` Before this patch, using libdwfl, you would see: ``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) b16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) b1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) b106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) b18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) a041161e77950c5c [unknown] ([unknown]) ``` With this patch applied, we get a bit further in unwinding: ``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) b16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) b1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) b106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) b18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) 7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so) 7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so) 7f38e42569e5 __GI___libc_malloc+0x115 (inlined) 7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined) 7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined) 7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined) 7f38e424df36 _IO_new_file_xsputn+0x116 (inlined) 7f38e4242bfb __GI__IO_fwrite+0xdb (inlined) 7f38e463fa6d std::basic_streambuf >::sputn(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator >::_M_put(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::__write(std::ostreambuf_iterator >, char const*, int)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::num_put > >::_M_insert_float(std::ostreambuf_iterator 7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl> 7f38e464bd70 std::ostream& std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25) 563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined) 563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking) 6eab825c1ee3e4ff [unknown] ([unknown]) ``` Note that the backtrace is still stopping too early, when compared to the nice results obtained via libunwind. It's unclear so far what the reason for that is. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa --- tools/perf/util/unwind-libdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index 6f318b15950e..5eff9bfc5758 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -45,13 +45,13 @@ static int __report_module(struct addr_location *al, u64 ip, Dwarf_Addr s; dwfl_module_info(mod, NULL,
[PATCH] perf util: take pgoff into account when reporting elf to libdwfl
ke_impl >, double>::_Async_state_impl(std::thread::_Invoker 563b9cb51149 std::__invoke_result >, double>::_Async_state_impl(std::thread::_Invoker >> 563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker >, double>::_Async_state_> 563b9cb51149 std::thread::_Invoker >, double>::_Async_state_impl(std::thread::_Invoker 563b9cb51149 std::thread::_State_impl >, double>::_Async_state_impl(std::thread> 7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25) 7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so) 7f38e42ccb22 __GI___clone+0x42 (inlined) ``` Before this patch, using libdwfl, you would see: ``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) b16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) b1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) b106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) b18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) a041161e77950c5c [unknown] ([unknown]) ``` With this patch applied, we get a bit further in unwinding: ``` cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120 b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux) b16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux) b1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux) b1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux) b106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux) b18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux) 7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so) 7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so) 7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so) 7f38e42569e5 __GI___libc_malloc+0x115 (inlined) 7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined) 7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined) 7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined) 7f38e424df36 _IO_new_file_xsputn+0x116 (inlined) 7f38e4242bfb __GI__IO_fwrite+0xdb (inlined) 7f38e463fa6d std::basic_streambuf >::sputn(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator >::_M_put(char const*, long)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::__write(std::ostreambuf_iterator >, char const*, int)+0x1cd (inlined) 7f38e463fa6d std::ostreambuf_iterator > std::num_put > >::_M_insert_float(std::ostreambuf_iterator 7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl> 7f38e464bd70 std::ostream& std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25) 563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined) 563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking) 6eab825c1ee3e4ff [unknown] ([unknown]) ``` Note that the backtrace is still stopping too early, when compared to the nice results obtained via libunwind. It's unclear so far what the reason for that is. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa --- tools/perf/util/unwind-libdw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index 6f318b15950e..5eff9bfc5758 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -45,13 +45,13 @@ static int __report_module(struct addr_location *al, u64 ip, Dwarf_Addr s; dwfl_module_info(mod, NULL,
[tip:perf/urgent] perf script: Flush output stream after events in verbose mode
Commit-ID: 7ee40678af935fb489b0c6cf0f75808175214cd7 Gitweb: https://git.kernel.org/tip/7ee40678af935fb489b0c6cf0f75808175214cd7 Author: Milian Wolff AuthorDate: Sun, 21 Oct 2018 21:14:24 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 22 Oct 2018 14:27:11 -0300 perf script: Flush output stream after events in verbose mode When the perf script output is written to a terminal stream, the normal output of `perf script` would get buffered, but its debug output would be written directly. This made it quite hard to figure out where a given debug output is coming from. We can improve on this by flushing the output buffer after processing an event. To see the value, compare the following output for a `perf script -v` run: Before this patch: ``` unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 ... lots and lots of verbose debug output cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` After this patch: ``` ... unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` This new output format makes it much easier to use perf script output for debugging purposes, e.g. to investigate broken dwarf unwinding. Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Link: http://lkml.kernel.org/r/20181021191424.16183-2-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index bd468b90801b..ca09b7d2adb7 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script, if (PRINT_FIELD(METRIC)) perf_sample__fprint_metric(script, thread, evsel, sample, fp); + + if (verbose) + fflush(fp); } static struct scripting_ops*scripting_ops;
[tip:perf/urgent] perf script: Flush output stream after events in verbose mode
Commit-ID: 7ee40678af935fb489b0c6cf0f75808175214cd7 Gitweb: https://git.kernel.org/tip/7ee40678af935fb489b0c6cf0f75808175214cd7 Author: Milian Wolff AuthorDate: Sun, 21 Oct 2018 21:14:24 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 22 Oct 2018 14:27:11 -0300 perf script: Flush output stream after events in verbose mode When the perf script output is written to a terminal stream, the normal output of `perf script` would get buffered, but its debug output would be written directly. This made it quite hard to figure out where a given debug output is coming from. We can improve on this by flushing the output buffer after processing an event. To see the value, compare the following output for a `perf script -v` run: Before this patch: ``` unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 ... lots and lots of verbose debug output cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` After this patch: ``` ... unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` This new output format makes it much easier to use perf script output for debugging purposes, e.g. to investigate broken dwarf unwinding. Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Link: http://lkml.kernel.org/r/20181021191424.16183-2-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index bd468b90801b..ca09b7d2adb7 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script, if (PRINT_FIELD(METRIC)) perf_sample__fprint_metric(script, thread, evsel, sample, fp); + + if (verbose) + fflush(fp); } static struct scripting_ops*scripting_ops;
[tip:perf/urgent] perf script: Allow extended console debug output
Commit-ID: c1c9b9695cc8868048f45c7e2559f65bc0be7382 Gitweb: https://git.kernel.org/tip/c1c9b9695cc8868048f45c7e2559f65bc0be7382 Author: Milian Wolff AuthorDate: Sun, 21 Oct 2018 21:14:23 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 22 Oct 2018 12:37:53 -0300 perf script: Allow extended console debug output The script tool isn't using a browser, yet use_browser wasn't set explicitly to zero. This in turn lead to confusing output such as: ``` $ perf script -vvv ... ... overlapping maps in /home/milian/foobar (disable tui for more info) ... ``` Explicitly set use_browser to 0 now, which gives us the extended debug information now in perf script as expected. Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Link: http://lkml.kernel.org/r/20181021191424.16183-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4da5e32b9e03..bd468b90801b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv) exit(-1); } - if (!script_name) + if (!script_name) { setup_pager(); + use_browser = 0; + } session = perf_session__new(, false, ); if (session == NULL)
[tip:perf/urgent] perf script: Allow extended console debug output
Commit-ID: c1c9b9695cc8868048f45c7e2559f65bc0be7382 Gitweb: https://git.kernel.org/tip/c1c9b9695cc8868048f45c7e2559f65bc0be7382 Author: Milian Wolff AuthorDate: Sun, 21 Oct 2018 21:14:23 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 22 Oct 2018 12:37:53 -0300 perf script: Allow extended console debug output The script tool isn't using a browser, yet use_browser wasn't set explicitly to zero. This in turn lead to confusing output such as: ``` $ perf script -vvv ... ... overlapping maps in /home/milian/foobar (disable tui for more info) ... ``` Explicitly set use_browser to 0 now, which gives us the extended debug information now in perf script as expected. Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Link: http://lkml.kernel.org/r/20181021191424.16183-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4da5e32b9e03..bd468b90801b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv) exit(-1); } - if (!script_name) + if (!script_name) { setup_pager(); + use_browser = 0; + } session = perf_session__new(, false, ); if (session == NULL)
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Dienstag, 23. Oktober 2018 06:03:56 CEST Andi Kleen wrote: > > So what if my libm wasn't compiled with -fasynchronous-unwind-tables? We > > It's default (64bit since always and 32bit now too) Unless someone disabled > it. Excellent, good to know. Since [1] doesn't explicitly disable it, I would assume the information should be available. [1]: https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD? h=packages/glibc > However libm might be partially written in assembler and hand written > assembler often has problems with unwind tables because the programmer has > to get them correct explicitely. Yes, that could be the case. I'm unsure about the glibc build system and what actually gets compiled, but I found a potential source at [2]: [2]: https://github.com/bminor/glibc/blob/ 43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/ieee754/dbl-64/e_hypot.c I believe this is what is used on my system, since I can spot calls to __issignaling@@GLIBC_2.18 etc. in the disassembly of __hypot_finite ([3]), which matches the sources referenced in [2]. [3]: https://paste.kde.org/poywa7y2z If [2] is used, then it's not hand written assembler but code compiled by the compiler. So unwinding should work, even from the prologue? I have since also figured out how to dump the .eh_frame contents in a human readable format via readelf. Remember, __hypot_finite on my system is at offset 0x29660 of libm, so I think the following are the corresponding .eh_frame contents: ``` $ readelf --debug-dump=frames /usr/lib/libm.so.6 |& less ... 2b60 004c 2b64 FDE cie= pc=00029660..000299ce DW_CFA_advance_loc: 6 to 00029666 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r13 (r13) at cfa-16 DW_CFA_advance_loc: 2 to 00029668 DW_CFA_def_cfa_offset: 24 DW_CFA_offset: r12 (r12) at cfa-24 DW_CFA_advance_loc: 1 to 00029669 DW_CFA_def_cfa_offset: 32 DW_CFA_offset: r6 (rbp) at cfa-32 DW_CFA_advance_loc: 6 to 0002966f DW_CFA_def_cfa_offset: 40 DW_CFA_offset: r3 (rbx) at cfa-40 DW_CFA_advance_loc: 29 to 0002968c DW_CFA_def_cfa_offset: 80 DW_CFA_advance_loc2: 291 to 000297af DW_CFA_remember_state DW_CFA_def_cfa_offset: 40 DW_CFA_advance_loc: 5 to 000297b4 DW_CFA_def_cfa_offset: 32 DW_CFA_advance_loc: 1 to 000297b5 DW_CFA_def_cfa_offset: 24 DW_CFA_advance_loc: 2 to 000297b7 DW_CFA_def_cfa_offset: 16 DW_CFA_advance_loc: 2 to 000297b9 DW_CFA_def_cfa_offset: 8 DW_CFA_advance_loc: 7 to 000297c0 DW_CFA_restore_state DW_CFA_advance_loc1: 88 to 00029818 DW_CFA_remember_state DW_CFA_def_cfa_offset: 40 DW_CFA_advance_loc: 1 to 00029819 DW_CFA_def_cfa_offset: 32 DW_CFA_advance_loc: 1 to 0002981a DW_CFA_def_cfa_offset: 24 DW_CFA_advance_loc: 2 to 0002981c DW_CFA_def_cfa_offset: 16 DW_CFA_advance_loc: 2 to 0002981e DW_CFA_def_cfa_offset: 8 DW_CFA_advance_loc: 18 to 00029830 DW_CFA_restore_state DW_CFA_nop ``` I notice that this does not touch the rsp register at all, even though it's mutated by the code, leading to the issue. See again this paste for the disassembly at [3], and note that the broken sample frame points at 0x00029688 <+40>:sub$0x28,%rsp Can someone at least confirm whether unwinding from a function prologue via .eh_frame (but without .debug_frame) should actually be possible? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Dienstag, 23. Oktober 2018 06:03:56 CEST Andi Kleen wrote: > > So what if my libm wasn't compiled with -fasynchronous-unwind-tables? We > > It's default (64bit since always and 32bit now too) Unless someone disabled > it. Excellent, good to know. Since [1] doesn't explicitly disable it, I would assume the information should be available. [1]: https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD? h=packages/glibc > However libm might be partially written in assembler and hand written > assembler often has problems with unwind tables because the programmer has > to get them correct explicitely. Yes, that could be the case. I'm unsure about the glibc build system and what actually gets compiled, but I found a potential source at [2]: [2]: https://github.com/bminor/glibc/blob/ 43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/ieee754/dbl-64/e_hypot.c I believe this is what is used on my system, since I can spot calls to __issignaling@@GLIBC_2.18 etc. in the disassembly of __hypot_finite ([3]), which matches the sources referenced in [2]. [3]: https://paste.kde.org/poywa7y2z If [2] is used, then it's not hand written assembler but code compiled by the compiler. So unwinding should work, even from the prologue? I have since also figured out how to dump the .eh_frame contents in a human readable format via readelf. Remember, __hypot_finite on my system is at offset 0x29660 of libm, so I think the following are the corresponding .eh_frame contents: ``` $ readelf --debug-dump=frames /usr/lib/libm.so.6 |& less ... 2b60 004c 2b64 FDE cie= pc=00029660..000299ce DW_CFA_advance_loc: 6 to 00029666 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r13 (r13) at cfa-16 DW_CFA_advance_loc: 2 to 00029668 DW_CFA_def_cfa_offset: 24 DW_CFA_offset: r12 (r12) at cfa-24 DW_CFA_advance_loc: 1 to 00029669 DW_CFA_def_cfa_offset: 32 DW_CFA_offset: r6 (rbp) at cfa-32 DW_CFA_advance_loc: 6 to 0002966f DW_CFA_def_cfa_offset: 40 DW_CFA_offset: r3 (rbx) at cfa-40 DW_CFA_advance_loc: 29 to 0002968c DW_CFA_def_cfa_offset: 80 DW_CFA_advance_loc2: 291 to 000297af DW_CFA_remember_state DW_CFA_def_cfa_offset: 40 DW_CFA_advance_loc: 5 to 000297b4 DW_CFA_def_cfa_offset: 32 DW_CFA_advance_loc: 1 to 000297b5 DW_CFA_def_cfa_offset: 24 DW_CFA_advance_loc: 2 to 000297b7 DW_CFA_def_cfa_offset: 16 DW_CFA_advance_loc: 2 to 000297b9 DW_CFA_def_cfa_offset: 8 DW_CFA_advance_loc: 7 to 000297c0 DW_CFA_restore_state DW_CFA_advance_loc1: 88 to 00029818 DW_CFA_remember_state DW_CFA_def_cfa_offset: 40 DW_CFA_advance_loc: 1 to 00029819 DW_CFA_def_cfa_offset: 32 DW_CFA_advance_loc: 1 to 0002981a DW_CFA_def_cfa_offset: 24 DW_CFA_advance_loc: 2 to 0002981c DW_CFA_def_cfa_offset: 16 DW_CFA_advance_loc: 2 to 0002981e DW_CFA_def_cfa_offset: 8 DW_CFA_advance_loc: 18 to 00029830 DW_CFA_restore_state DW_CFA_nop ``` I notice that this does not touch the rsp register at all, even though it's mutated by the code, leading to the issue. See again this paste for the disassembly at [3], and note that the broken sample frame points at 0x00029688 <+40>:sub$0x28,%rsp Can someone at least confirm whether unwinding from a function prologue via .eh_frame (but without .debug_frame) should actually be possible? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Montag, 22. Oktober 2018 15:58:17 CEST Andi Kleen wrote: > Milian Wolff writes: > > After more digging, it turns out that I've apparently chased a red > > herring. > > I'm running archlinux which isn't shipping debug symbols for libm. > > 64bit executables normally have unwind information even when stripped. > Unless someone forcefully stripped those too. > > You can checkout with objdump --sections. Right, we do have .eh_frame and .eh_frame_hdr according to readelf: ``` $ readelf --sections /usr/lib/libm.so.6 There are 26 section headers, starting at offset 0x183120: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0 0 0 [ 1] .note.gnu.build-i NOTE 0270 0270 0024 A 0 0 4 [ 2] .note.ABI-tag NOTE 0294 0294 0020 A 0 0 4 [ 3] .note.gnu.propert NOTE 02b8 02b8 0020 A 0 0 8 [ 4] .gnu.hash GNU_HASH 02d8 02d8 24d0 A 5 0 8 [ 5] .dynsym DYNSYM 27a8 27a8 66c0 0018 A 6 1 8 [ 6] .dynstr STRTAB 8e68 8e68 2352 A 0 0 1 [ 7] .gnu.version VERSYM b1ba b1ba 0890 0002 A 5 0 2 [ 8] .gnu.version_dVERDEF ba50 ba50 017c A 611 8 [ 9] .gnu.version_rVERNEED bbd0 bbd0 0060 A 6 2 8 [10] .rela.dyn RELA bc30 bc30 0480 0018 A 5 0 8 [11] .init PROGBITS d000 d000 001b AX 0 0 4 [12] .text PROGBITS d020 d020 000a063b AX 0 0 16 [13] .fini PROGBITS 000ad65c 000ad65c 000d AX 0 0 4 [14] .rodata PROGBITS 000ae000 000ae000 000c76a4 A 0 0 32 [15] .eh_frame_hdr PROGBITS 001756a4 001756a4 1c34 A 0 0 4 [16] .eh_frame PROGBITS 001772d8 001772d8 93f0 A 0 0 8 [17] .hash HASH 001806c8 001806c8 210c 0004 A 5 0 8 [18] .init_array INIT_ARRAY 00183c80 00182c80 0008 0008 WA 0 0 8 [19] .fini_array FINI_ARRAY 00183c88 00182c88 0008 0008 WA 0 0 8 [20] .dynamic DYNAMIC 00183c90 00182c90 01f0 0010 WA 6 0 8 [21] .got PROGBITS 00183e80 00182e80 0180 0008 WA 0 0 8 [22] .data PROGBITS 00184000 00183000 000c WA 0 0 8 [23] .bss NOBITS 0018400c 0018300c 000c WA 0 0 4 [24] .comment PROGBITS 0018300c 001a 0001 MS 0 0 1 [25] .shstrtab STRTAB 00183026 00fa 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) ``` But should that be enough information to be able to unwind from a function prologue? I mean, it obviously seems to work when we unwind from the function body. But how would I know whether it should work from the prologue too? Reading e.g. https://www.airs.com/blog/archives/460, I can find: > There should be exactly one FDE covering each instruction which may be being executed when an exception occurs. By default an exception can only o
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Montag, 22. Oktober 2018 15:58:17 CEST Andi Kleen wrote: > Milian Wolff writes: > > After more digging, it turns out that I've apparently chased a red > > herring. > > I'm running archlinux which isn't shipping debug symbols for libm. > > 64bit executables normally have unwind information even when stripped. > Unless someone forcefully stripped those too. > > You can checkout with objdump --sections. Right, we do have .eh_frame and .eh_frame_hdr according to readelf: ``` $ readelf --sections /usr/lib/libm.so.6 There are 26 section headers, starting at offset 0x183120: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0 0 0 [ 1] .note.gnu.build-i NOTE 0270 0270 0024 A 0 0 4 [ 2] .note.ABI-tag NOTE 0294 0294 0020 A 0 0 4 [ 3] .note.gnu.propert NOTE 02b8 02b8 0020 A 0 0 8 [ 4] .gnu.hash GNU_HASH 02d8 02d8 24d0 A 5 0 8 [ 5] .dynsym DYNSYM 27a8 27a8 66c0 0018 A 6 1 8 [ 6] .dynstr STRTAB 8e68 8e68 2352 A 0 0 1 [ 7] .gnu.version VERSYM b1ba b1ba 0890 0002 A 5 0 2 [ 8] .gnu.version_dVERDEF ba50 ba50 017c A 611 8 [ 9] .gnu.version_rVERNEED bbd0 bbd0 0060 A 6 2 8 [10] .rela.dyn RELA bc30 bc30 0480 0018 A 5 0 8 [11] .init PROGBITS d000 d000 001b AX 0 0 4 [12] .text PROGBITS d020 d020 000a063b AX 0 0 16 [13] .fini PROGBITS 000ad65c 000ad65c 000d AX 0 0 4 [14] .rodata PROGBITS 000ae000 000ae000 000c76a4 A 0 0 32 [15] .eh_frame_hdr PROGBITS 001756a4 001756a4 1c34 A 0 0 4 [16] .eh_frame PROGBITS 001772d8 001772d8 93f0 A 0 0 8 [17] .hash HASH 001806c8 001806c8 210c 0004 A 5 0 8 [18] .init_array INIT_ARRAY 00183c80 00182c80 0008 0008 WA 0 0 8 [19] .fini_array FINI_ARRAY 00183c88 00182c88 0008 0008 WA 0 0 8 [20] .dynamic DYNAMIC 00183c90 00182c90 01f0 0010 WA 6 0 8 [21] .got PROGBITS 00183e80 00182e80 0180 0008 WA 0 0 8 [22] .data PROGBITS 00184000 00183000 000c WA 0 0 8 [23] .bss NOBITS 0018400c 0018300c 000c WA 0 0 4 [24] .comment PROGBITS 0018300c 001a 0001 MS 0 0 1 [25] .shstrtab STRTAB 00183026 00fa 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) ``` But should that be enough information to be able to unwind from a function prologue? I mean, it obviously seems to work when we unwind from the function body. But how would I know whether it should work from the prologue too? Reading e.g. https://www.airs.com/blog/archives/460, I can find: > There should be exactly one FDE covering each instruction which may be being executed when an exception occurs. By default an exception can only o
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Montag, 22. Oktober 2018 12:35:39 CEST Milian Wolff wrote: > On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote: > > Hey all, > > > > I'm on the quest to figure out why perf regularly fails to unwind (some) > > samples. I am seeing very strange behavior, where an apparently wrong > > stack > > pointer value is read from the register - see below for more information > > and the end of this (long) mail for my open questions. Any help would be > > greatly appreciated. > > > > I am currently using this trivial C++ code to reproduce the issue: > > > > ``` > > #include > > #include > > #include > > #include > > > > using namespace std; > > > > int main() > > { > > > > uniform_real_distribution uniform(-1E5, 1E5); > > default_random_engine engine; > > double s = 0; > > for (int i = 0; i < 1000; ++i) { > > > > s += norm(complex(uniform(engine), uniform(engine))); > > > > } > > cout << s << '\n'; > > return 0; > > > > } > > ``` > > > > I compile it with `g++ -O2 -g` and then record it with `perf record > > --call- > > graph dwarf`. Using perf script, I then see e.g.: > > > > ``` > > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns > > ... > > # first frame (working unwinding from __hypot_finite): > > unwind: reg 16, val 7faf7dca2696 > > unwind: reg 7, val 7ffc80811ca0 > > unwind: find_proc_info dso /usr/lib/libm-2.28.so > > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72 > > unwind: find_proc_info dso /usr/lib/libm-2.28.so > > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104 > > unwind: find_proc_info dso > > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > > test-clients/cpp-inlining/cpp-inlining > > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184 > > unwind: find_proc_info dso /usr/lib/libc-2.28.so > > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376 > > unwind: find_proc_info dso > > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > > test-clients/cpp-inlining/cpp-inlining > > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696) > > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8) > > unwind: main:ip = 0x56382b0fc128 (0x1128) > > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > > # second frame (unrelated) > > unwind: reg 16, val 56382b0fc114 > > unwind: reg 7, val 7ffc80811d10 > > unwind: find_proc_info dso > > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > > test-clients/cpp-inlining/cpp-inlining > > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72 > > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264 > > unwind: main:ip = 0x56382b0fc114 (0x1114) > > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > > # third frame (broken unwinding from __hypot_finite) > > unwind: reg 16, val 7faf7dca2688 > > unwind: reg 7, val 7ffc80811ca0 > > unwind: find_proc_info dso /usr/lib/libm-2.28.so > > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32 > > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) > > unwind: '':ip = 0x (0x0) > > > > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp: > > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) > > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) > > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp: > > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: > > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so) >
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Montag, 22. Oktober 2018 12:35:39 CEST Milian Wolff wrote: > On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote: > > Hey all, > > > > I'm on the quest to figure out why perf regularly fails to unwind (some) > > samples. I am seeing very strange behavior, where an apparently wrong > > stack > > pointer value is read from the register - see below for more information > > and the end of this (long) mail for my open questions. Any help would be > > greatly appreciated. > > > > I am currently using this trivial C++ code to reproduce the issue: > > > > ``` > > #include > > #include > > #include > > #include > > > > using namespace std; > > > > int main() > > { > > > > uniform_real_distribution uniform(-1E5, 1E5); > > default_random_engine engine; > > double s = 0; > > for (int i = 0; i < 1000; ++i) { > > > > s += norm(complex(uniform(engine), uniform(engine))); > > > > } > > cout << s << '\n'; > > return 0; > > > > } > > ``` > > > > I compile it with `g++ -O2 -g` and then record it with `perf record > > --call- > > graph dwarf`. Using perf script, I then see e.g.: > > > > ``` > > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns > > ... > > # first frame (working unwinding from __hypot_finite): > > unwind: reg 16, val 7faf7dca2696 > > unwind: reg 7, val 7ffc80811ca0 > > unwind: find_proc_info dso /usr/lib/libm-2.28.so > > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72 > > unwind: find_proc_info dso /usr/lib/libm-2.28.so > > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104 > > unwind: find_proc_info dso > > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > > test-clients/cpp-inlining/cpp-inlining > > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184 > > unwind: find_proc_info dso /usr/lib/libc-2.28.so > > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376 > > unwind: find_proc_info dso > > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > > test-clients/cpp-inlining/cpp-inlining > > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696) > > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8) > > unwind: main:ip = 0x56382b0fc128 (0x1128) > > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > > # second frame (unrelated) > > unwind: reg 16, val 56382b0fc114 > > unwind: reg 7, val 7ffc80811d10 > > unwind: find_proc_info dso > > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > > test-clients/cpp-inlining/cpp-inlining > > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72 > > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264 > > unwind: main:ip = 0x56382b0fc114 (0x1114) > > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > > # third frame (broken unwinding from __hypot_finite) > > unwind: reg 16, val 7faf7dca2688 > > unwind: reg 7, val 7ffc80811ca0 > > unwind: find_proc_info dso /usr/lib/libm-2.28.so > > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32 > > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) > > unwind: '':ip = 0x (0x0) > > > > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp: > > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) > > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) > > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp: > > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > > > > build/tests/test-clients/cpp-inlining/cpp-inlining) > > > > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: > > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so) >
Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode
On Montag, 22. Oktober 2018 12:16:18 CEST Jiri Olsa wrote: > On Mon, Oct 22, 2018 at 12:09:22PM +0200, Milian Wolff wrote: > > On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote: > > > On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote: > > > > When the perf script output is written to a terminal stream, > > > > the normal output of `perf script` would get buffered, but its > > > > debug output would be written directly. This made it quite hard > > > > to figure out where a given debug output is coming from. We can > > > > improve on this by flushing the output buffer after processing an > > > > event. To see the value, compare the following output for a > > > > `perf script -v` run: > > > > > > > > Before this patch: > > > > ``` > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > ... lots and lots of verbose debug output > > > > > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > ... > > > > ``` > > > > > > > > After this patch: > > > > ``` > > > > ... > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > ... > > > > ``` > > > > > > > > This new output format makes it much easier to use perf script > > > > output for debugging purposes, e.g. to investigate broken dwarf > > > > unwinding. > > > > > > yep, I plan to check on this ;-) > > > > > > > Signed-off-by: Milian Wolff > > > > Cc: Arnaldo Carvalho de Melo > > > > --- > > > > > > > > tools/perf/builtin-script.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > > > > index bd468b90801b..ca09b7d2adb7 100644 > > > > --- a/tools/perf/builtin-script.c > > > > +++ b/tools/perf/builtin-script.c > > > > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script > > > > *script, > > > > > > > > if (PRINT_FIELD(METRIC)) > > > > > > > > perf_sample__fprint_metric(script, thread, evsel, sample, fp); > > > > > > > > + > > > > + if (verbose) > > > > + fflush(fp); > > > > > > should we call fflush(NULL) to dump all the streams? > > > > > > the verbose goes to stderr and fp seems to be stdout byt default > > > > stderr isn't buffered, so we don't need to flush it. So personally, I > > don't > > see a need to dump all streams - fp should be enough? Can you maybe > > explain > > where it would be required to flush more buffers? > > hum, did not know stderr wasn't buffer > > I think there's perf script feature to store the events data to > separate files per each event.. but I guess we don't need to > flush them.. we just need to have stdout and stderr in sync IIUC Exactly, and that's achieved with this patch form what I see :) Or should we maybe instead call setbuf(fp, NULL); in verbose mode? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode
On Montag, 22. Oktober 2018 12:16:18 CEST Jiri Olsa wrote: > On Mon, Oct 22, 2018 at 12:09:22PM +0200, Milian Wolff wrote: > > On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote: > > > On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote: > > > > When the perf script output is written to a terminal stream, > > > > the normal output of `perf script` would get buffered, but its > > > > debug output would be written directly. This made it quite hard > > > > to figure out where a given debug output is coming from. We can > > > > improve on this by flushing the output buffer after processing an > > > > event. To see the value, compare the following output for a > > > > `perf script -v` run: > > > > > > > > Before this patch: > > > > ``` > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > ... lots and lots of verbose debug output > > > > > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > ... > > > > ``` > > > > > > > > After this patch: > > > > ``` > > > > ... > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > unwind: reg 16, val 7faf7dfdc000 > > > > unwind: reg 7, val 7ffc80811e30 > > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > > > unwind: reg 6, val 0 > > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > > > > > ... > > > > ``` > > > > > > > > This new output format makes it much easier to use perf script > > > > output for debugging purposes, e.g. to investigate broken dwarf > > > > unwinding. > > > > > > yep, I plan to check on this ;-) > > > > > > > Signed-off-by: Milian Wolff > > > > Cc: Arnaldo Carvalho de Melo > > > > --- > > > > > > > > tools/perf/builtin-script.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > > > > index bd468b90801b..ca09b7d2adb7 100644 > > > > --- a/tools/perf/builtin-script.c > > > > +++ b/tools/perf/builtin-script.c > > > > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script > > > > *script, > > > > > > > > if (PRINT_FIELD(METRIC)) > > > > > > > > perf_sample__fprint_metric(script, thread, evsel, sample, fp); > > > > > > > > + > > > > + if (verbose) > > > > + fflush(fp); > > > > > > should we call fflush(NULL) to dump all the streams? > > > > > > the verbose goes to stderr and fp seems to be stdout byt default > > > > stderr isn't buffered, so we don't need to flush it. So personally, I > > don't > > see a need to dump all streams - fp should be enough? Can you maybe > > explain > > where it would be required to flush more buffers? > > hum, did not know stderr wasn't buffer > > I think there's perf script feature to store the events data to > separate files per each event.. but I guess we don't need to > flush them.. we just need to have stdout and stderr in sync IIUC Exactly, and that's achieved with this patch form what I see :) Or should we maybe instead call setbuf(fp, NULL); in verbose mode? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote: > Hey all, > > I'm on the quest to figure out why perf regularly fails to unwind (some) > samples. I am seeing very strange behavior, where an apparently wrong stack > pointer value is read from the register - see below for more information and > the end of this (long) mail for my open questions. Any help would be > greatly appreciated. > > I am currently using this trivial C++ code to reproduce the issue: > > ``` > #include > #include > #include > #include > > using namespace std; > > int main() > { > uniform_real_distribution uniform(-1E5, 1E5); > default_random_engine engine; > double s = 0; > for (int i = 0; i < 1000; ++i) { > s += norm(complex(uniform(engine), uniform(engine))); > } > cout << s << '\n'; > return 0; > } > ``` > > I compile it with `g++ -O2 -g` and then record it with `perf record --call- > graph dwarf`. Using perf script, I then see e.g.: > > ``` > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns > ... > # first frame (working unwinding from __hypot_finite): > unwind: reg 16, val 7faf7dca2696 > unwind: reg 7, val 7ffc80811ca0 > unwind: find_proc_info dso /usr/lib/libm-2.28.so > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72 > unwind: find_proc_info dso /usr/lib/libm-2.28.so > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104 > unwind: find_proc_info dso > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > test-clients/cpp-inlining/cpp-inlining > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184 > unwind: find_proc_info dso /usr/lib/libc-2.28.so > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376 > unwind: find_proc_info dso > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > test-clients/cpp-inlining/cpp-inlining > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696) > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8) > unwind: main:ip = 0x56382b0fc128 (0x1128) > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > # second frame (unrelated) > unwind: reg 16, val 56382b0fc114 > unwind: reg 7, val 7ffc80811d10 > unwind: find_proc_info dso > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > test-clients/cpp-inlining/cpp-inlining > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72 > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264 > unwind: main:ip = 0x56382b0fc114 (0x1114) > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > # third frame (broken unwinding from __hypot_finite) > unwind: reg 16, val 7faf7dca2688 > unwind: reg 7, val 7ffc80811ca0 > unwind: find_proc_info dso /usr/lib/libm-2.28.so > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32 > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) > unwind: '':ip = 0x (0x0) > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp: > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp: > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so) > [unknown] ([unknown]) > ... > ``` > > Now I'm trying to figure out why one __hypot_finite sample works but the > other one breaks for no apparent reason. I've now collected some more background information, which is quite helpful I believe for the analysis of this issue: Note how the broken sample has the IP pointing at __hypot_finite+0x28: unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) When we run my reproducer code in GDB, we can see that obtaining a backtrace from that address works just fine there: ``` $ gdb ./cpp-inlining GNU gdb (GDB) 8.2 Copyright (C) 2018 Free Software Foundation, Inc. License GPL
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote: > Hey all, > > I'm on the quest to figure out why perf regularly fails to unwind (some) > samples. I am seeing very strange behavior, where an apparently wrong stack > pointer value is read from the register - see below for more information and > the end of this (long) mail for my open questions. Any help would be > greatly appreciated. > > I am currently using this trivial C++ code to reproduce the issue: > > ``` > #include > #include > #include > #include > > using namespace std; > > int main() > { > uniform_real_distribution uniform(-1E5, 1E5); > default_random_engine engine; > double s = 0; > for (int i = 0; i < 1000; ++i) { > s += norm(complex(uniform(engine), uniform(engine))); > } > cout << s << '\n'; > return 0; > } > ``` > > I compile it with `g++ -O2 -g` and then record it with `perf record --call- > graph dwarf`. Using perf script, I then see e.g.: > > ``` > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns > ... > # first frame (working unwinding from __hypot_finite): > unwind: reg 16, val 7faf7dca2696 > unwind: reg 7, val 7ffc80811ca0 > unwind: find_proc_info dso /usr/lib/libm-2.28.so > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72 > unwind: find_proc_info dso /usr/lib/libm-2.28.so > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104 > unwind: find_proc_info dso > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > test-clients/cpp-inlining/cpp-inlining > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184 > unwind: find_proc_info dso /usr/lib/libc-2.28.so > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376 > unwind: find_proc_info dso > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > test-clients/cpp-inlining/cpp-inlining > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696) > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8) > unwind: main:ip = 0x56382b0fc128 (0x1128) > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > # second frame (unrelated) > unwind: reg 16, val 56382b0fc114 > unwind: reg 7, val 7ffc80811d10 > unwind: find_proc_info dso > /home/milian/projects/kdab/rnd/hotspot/build/tests/ > test-clients/cpp-inlining/cpp-inlining > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72 > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264 > unwind: main:ip = 0x56382b0fc114 (0x1114) > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) > unwind: _start:ip = 0x56382b0fc1ed (0x11ed) > # third frame (broken unwinding from __hypot_finite) > unwind: reg 16, val 7faf7dca2688 > unwind: reg 7, val 7ffc80811ca0 > unwind: find_proc_info dso /usr/lib/libm-2.28.so > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32 > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) > unwind: '':ip = 0x (0x0) > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp: > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so) > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp: > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ > build/tests/test-clients/cpp-inlining/cpp-inlining) > > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so) > [unknown] ([unknown]) > ... > ``` > > Now I'm trying to figure out why one __hypot_finite sample works but the > other one breaks for no apparent reason. I've now collected some more background information, which is quite helpful I believe for the analysis of this issue: Note how the broken sample has the IP pointing at __hypot_finite+0x28: unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) When we run my reproducer code in GDB, we can see that obtaining a backtrace from that address works just fine there: ``` $ gdb ./cpp-inlining GNU gdb (GDB) 8.2 Copyright (C) 2018 Free Software Foundation, Inc. License GPL
Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode
On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote: > On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote: > > When the perf script output is written to a terminal stream, > > the normal output of `perf script` would get buffered, but its > > debug output would be written directly. This made it quite hard > > to figure out where a given debug output is coming from. We can > > improve on this by flushing the output buffer after processing an > > event. To see the value, compare the following output for a > > `perf script -v` run: > > > > Before this patch: > > ``` > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > ... lots and lots of verbose debug output > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > ... > > ``` > > > > After this patch: > > ``` > > ... > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > ... > > ``` > > > > This new output format makes it much easier to use perf script > > output for debugging purposes, e.g. to investigate broken dwarf > > unwinding. > > yep, I plan to check on this ;-) > > > Signed-off-by: Milian Wolff > > Cc: Arnaldo Carvalho de Melo > > --- > > > > tools/perf/builtin-script.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > > index bd468b90801b..ca09b7d2adb7 100644 > > --- a/tools/perf/builtin-script.c > > +++ b/tools/perf/builtin-script.c > > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script > > *script, > > > > if (PRINT_FIELD(METRIC)) > > > > perf_sample__fprint_metric(script, thread, evsel, sample, fp); > > > > + > > + if (verbose) > > + fflush(fp); > > should we call fflush(NULL) to dump all the streams? > > the verbose goes to stderr and fp seems to be stdout byt default stderr isn't buffered, so we don't need to flush it. So personally, I don't see a need to dump all streams - fp should be enough? Can you maybe explain where it would be required to flush more buffers? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode
On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote: > On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote: > > When the perf script output is written to a terminal stream, > > the normal output of `perf script` would get buffered, but its > > debug output would be written directly. This made it quite hard > > to figure out where a given debug output is coming from. We can > > improve on this by flushing the output buffer after processing an > > event. To see the value, compare the following output for a > > `perf script -v` run: > > > > Before this patch: > > ``` > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > ... lots and lots of verbose debug output > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > ... > > ``` > > > > After this patch: > > ``` > > ... > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > cpp-inlining 24617 90229.122036534: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > unwind: reg 16, val 7faf7dfdc000 > > unwind: reg 7, val 7ffc80811e30 > > unwind: find_proc_info dso /usr/lib/ld-2.28.so > > unwind: reg 6, val 0 > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000) > > > > cpp-inlining 24617 90229.122043974: 1 cycles:uppp: > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) > > > > ... > > ``` > > > > This new output format makes it much easier to use perf script > > output for debugging purposes, e.g. to investigate broken dwarf > > unwinding. > > yep, I plan to check on this ;-) > > > Signed-off-by: Milian Wolff > > Cc: Arnaldo Carvalho de Melo > > --- > > > > tools/perf/builtin-script.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > > index bd468b90801b..ca09b7d2adb7 100644 > > --- a/tools/perf/builtin-script.c > > +++ b/tools/perf/builtin-script.c > > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script > > *script, > > > > if (PRINT_FIELD(METRIC)) > > > > perf_sample__fprint_metric(script, thread, evsel, sample, fp); > > > > + > > + if (verbose) > > + fflush(fp); > > should we call fflush(NULL) to dump all the streams? > > the verbose goes to stderr and fp seems to be stdout byt default stderr isn't buffered, so we don't need to flush it. So personally, I don't see a need to dump all streams - fp should be enough? Can you maybe explain where it would be required to flush more buffers? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote: > Hey all, > > I'm on the quest to figure out why perf regularly fails to unwind (some) > samples. I am seeing very strange behavior, where an apparently wrong stack > pointer value is read from the register - see below for more information and > the end of this (long) mail for my open questions. Any help would be > greatly appreciated. > > I am currently using this trivial C++ code to reproduce the issue: > > ``` > #include > #include > #include > #include > > using namespace std; > > int main() > { > uniform_real_distribution uniform(-1E5, 1E5); > default_random_engine engine; > double s = 0; > for (int i = 0; i < 1000; ++i) { > s += norm(complex(uniform(engine), uniform(engine))); > } > cout << s << '\n'; > return 0; > } > ``` > > I compile it with `g++ -O2 -g` and then record it with `perf record --call- > graph dwarf`. Using perf script, I then see e.g.: With my patch to regularly flush the perf script output buffer, we can now easily find all broken backtraces and the corresponding debug output via: $ perf script --ns -v |& awk -v RS='' '/\[unknown\]/ {print "\n"$0}' I've pasted the output to the above command from my machine here: https://paste.kde.org/pmyxwkk1k This contains 139 samples with broken unwinding, out of 2350 samples in total, so about 6% of all samples are broken. In many cases, the first accessed memory is 0 because a too-low offset into the stack is computed from the SP value, similar to the scenario I described in my initial mail. In other cases we read garbadge addresses such as unwind: access_mem addr 0x7ffc80811cf0 val 408195dfbda90580, offset 24 In all cases except for the the two samples at the very start and end of this log, the last offset encountered in access_mem is lower than 72. Remember what I wrote in the initial mail - if I manually hack the access_mem function to use 72 for one of the broken samples, it made unwinding magically work again... With this addition of data - can anyone sched some light on what's potentially going on here? How can we improve this situation? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: Broken dwarf unwinding - wrong stack pointer register value?
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote: > Hey all, > > I'm on the quest to figure out why perf regularly fails to unwind (some) > samples. I am seeing very strange behavior, where an apparently wrong stack > pointer value is read from the register - see below for more information and > the end of this (long) mail for my open questions. Any help would be > greatly appreciated. > > I am currently using this trivial C++ code to reproduce the issue: > > ``` > #include > #include > #include > #include > > using namespace std; > > int main() > { > uniform_real_distribution uniform(-1E5, 1E5); > default_random_engine engine; > double s = 0; > for (int i = 0; i < 1000; ++i) { > s += norm(complex(uniform(engine), uniform(engine))); > } > cout << s << '\n'; > return 0; > } > ``` > > I compile it with `g++ -O2 -g` and then record it with `perf record --call- > graph dwarf`. Using perf script, I then see e.g.: With my patch to regularly flush the perf script output buffer, we can now easily find all broken backtraces and the corresponding debug output via: $ perf script --ns -v |& awk -v RS='' '/\[unknown\]/ {print "\n"$0}' I've pasted the output to the above command from my machine here: https://paste.kde.org/pmyxwkk1k This contains 139 samples with broken unwinding, out of 2350 samples in total, so about 6% of all samples are broken. In many cases, the first accessed memory is 0 because a too-low offset into the stack is computed from the SP value, similar to the scenario I described in my initial mail. In other cases we read garbadge addresses such as unwind: access_mem addr 0x7ffc80811cf0 val 408195dfbda90580, offset 24 In all cases except for the the two samples at the very start and end of this log, the last offset encountered in access_mem is lower than 72. Remember what I wrote in the initial mail - if I manually hack the access_mem function to use 72 for one of the broken samples, it made unwinding magically work again... With this addition of data - can anyone sched some light on what's potentially going on here? How can we improve this situation? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[PATCH 2/2] perf script: flush output stream after events in verbose mode
When the perf script output is written to a terminal stream, the normal output of `perf script` would get buffered, but its debug output would be written directly. This made it quite hard to figure out where a given debug output is coming from. We can improve on this by flushing the output buffer after processing an event. To see the value, compare the following output for a `perf script -v` run: Before this patch: ``` unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 ... lots and lots of verbose debug output cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` After this patch: ``` ... unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` This new output format makes it much easier to use perf script output for debugging purposes, e.g. to investigate broken dwarf unwinding. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index bd468b90801b..ca09b7d2adb7 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script, if (PRINT_FIELD(METRIC)) perf_sample__fprint_metric(script, thread, evsel, sample, fp); + + if (verbose) + fflush(fp); } static struct scripting_ops*scripting_ops; -- 2.19.1
[PATCH 1/2] perf script: allow extended console debug output
The script tool isn't using a browser, yet use_browser wasn't set explicitly to zero. This in turn lead to confusing output such as: ``` $ perf script -vvv ... ... overlapping maps in /home/milian/foobar (disable tui for more info) ... ``` Explicitly set use_browser to 0 now, which gives us the extended debug information now in perf script as expected. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4da5e32b9e03..bd468b90801b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv) exit(-1); } - if (!script_name) + if (!script_name) { setup_pager(); + use_browser = 0; + } session = perf_session__new(, false, ); if (session == NULL) -- 2.19.1
[PATCH 2/2] perf script: flush output stream after events in verbose mode
When the perf script output is written to a terminal stream, the normal output of `perf script` would get buffered, but its debug output would be written directly. This made it quite hard to figure out where a given debug output is coming from. We can improve on this by flushing the output buffer after processing an event. To see the value, compare the following output for a `perf script -v` run: Before this patch: ``` unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 ... lots and lots of verbose debug output cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` After this patch: ``` ... unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122036534: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) unwind: reg 16, val 7faf7dfdc000 unwind: reg 7, val 7ffc80811e30 unwind: find_proc_info dso /usr/lib/ld-2.28.so unwind: reg 6, val 0 unwind: _start:ip = 0x7faf7dfdc000 (0x2000) cpp-inlining 24617 90229.122043974: 1 cycles:uppp: 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so) ... ``` This new output format makes it much easier to use perf script output for debugging purposes, e.g. to investigate broken dwarf unwinding. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index bd468b90801b..ca09b7d2adb7 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script, if (PRINT_FIELD(METRIC)) perf_sample__fprint_metric(script, thread, evsel, sample, fp); + + if (verbose) + fflush(fp); } static struct scripting_ops*scripting_ops; -- 2.19.1
[PATCH 1/2] perf script: allow extended console debug output
The script tool isn't using a browser, yet use_browser wasn't set explicitly to zero. This in turn lead to confusing output such as: ``` $ perf script -vvv ... ... overlapping maps in /home/milian/foobar (disable tui for more info) ... ``` Explicitly set use_browser to 0 now, which gives us the extended debug information now in perf script as expected. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4da5e32b9e03..bd468b90801b 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv) exit(-1); } - if (!script_name) + if (!script_name) { setup_pager(); + use_browser = 0; + } session = perf_session__new(, false, ); if (session == NULL) -- 2.19.1
Broken dwarf unwinding - wrong stack pointer register value?
eaningful value... This offset is calculcated from LIBUNWIND__ARCH_REG_SP, cf. unwind-libunwind- local.c. So is the stack pointer address in the register wrong? If I hackishly offset the value for the stack pointer by 40, i.e.: ``` diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/ unwind-libunwind-local.c index 79f521a552cf..a766ddaaa4dd 100644 --- a/tools/perf/util/unwind-libunwind-local.c +++ b/tools/perf/util/unwind-libunwind-local.c @@ -502,6 +502,7 @@ static int access_mem(unw_addr_space_t __maybe_unused as, if (ret) return ret; + start -= 40; end = start + stack->size; /* Check overflow. */ ``` Then I can successfully unwind the broken sample: ``` $ perf script -v --no-inline --time 90229.127156,90229.127158 --ns ... unwind: reg 16, val 7faf7dca2688 unwind: reg 7, val 7ffc80811ca0 unwind: find_proc_info dso /usr/lib/libm-2.28.so unwind: access_mem addr 0x7ffc80811cc0 val 7faf7dc88af9, offset 72 unwind: find_proc_info dso /usr/lib/libm-2.28.so unwind: access_mem addr 0x7ffc80811ce0 val 56382b0fc129, offset 104 unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/ test-clients/cpp-inlining/cpp-inlining unwind: access_mem addr 0x7ffc80811d30 val 7faf7dabf223, offset 184 unwind: find_proc_info dso /usr/lib/libc-2.28.so unwind: access_mem addr 0x7ffc80811df0 val 56382b0fc1ee, offset 376 unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/ test-clients/cpp-inlining/cpp-inlining unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8) unwind: main:ip = 0x56382b0fc128 (0x1128) unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) unwind: _start:ip = 0x56382b0fc1ed (0x11ed) cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so) 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ build/tests/test-clients/cpp-inlining/cpp-inlining) 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ build/tests/test-clients/cpp-inlining/cpp-inlining) ``` So, what now? Here are my open questions: Is this just working now by chance, or is this the real reason? I.e. is the register value for the stack pointer incorrectly recorded? Can this be fixed somehow during record time? Can we detect this scenario at analysis time and correct the stack pointer address automatically somehow? Should the first frame always try to read from offset 72 maybe? Any help would be greatly appreciated, many thanks -- Milian Wolff m...@milianw.de http://milianw.de signature.asc Description: This is a digitally signed message part.
Broken dwarf unwinding - wrong stack pointer register value?
eaningful value... This offset is calculcated from LIBUNWIND__ARCH_REG_SP, cf. unwind-libunwind- local.c. So is the stack pointer address in the register wrong? If I hackishly offset the value for the stack pointer by 40, i.e.: ``` diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/ unwind-libunwind-local.c index 79f521a552cf..a766ddaaa4dd 100644 --- a/tools/perf/util/unwind-libunwind-local.c +++ b/tools/perf/util/unwind-libunwind-local.c @@ -502,6 +502,7 @@ static int access_mem(unw_addr_space_t __maybe_unused as, if (ret) return ret; + start -= 40; end = start + stack->size; /* Check overflow. */ ``` Then I can successfully unwind the broken sample: ``` $ perf script -v --no-inline --time 90229.127156,90229.127158 --ns ... unwind: reg 16, val 7faf7dca2688 unwind: reg 7, val 7ffc80811ca0 unwind: find_proc_info dso /usr/lib/libm-2.28.so unwind: access_mem addr 0x7ffc80811cc0 val 7faf7dc88af9, offset 72 unwind: find_proc_info dso /usr/lib/libm-2.28.so unwind: access_mem addr 0x7ffc80811ce0 val 56382b0fc129, offset 104 unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/ test-clients/cpp-inlining/cpp-inlining unwind: access_mem addr 0x7ffc80811d30 val 7faf7dabf223, offset 184 unwind: find_proc_info dso /usr/lib/libc-2.28.so unwind: access_mem addr 0x7ffc80811df0 val 56382b0fc1ee, offset 376 unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/ test-clients/cpp-inlining/cpp-inlining unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688) unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8) unwind: main:ip = 0x56382b0fc128 (0x1128) unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222) unwind: _start:ip = 0x56382b0fc1ed (0x11ed) cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so) 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/ build/tests/test-clients/cpp-inlining/cpp-inlining) 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so) 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/ build/tests/test-clients/cpp-inlining/cpp-inlining) ``` So, what now? Here are my open questions: Is this just working now by chance, or is this the real reason? I.e. is the register value for the stack pointer incorrectly recorded? Can this be fixed somehow during record time? Can we detect this scenario at analysis time and correct the stack pointer address automatically somehow? Should the first frame always try to read from offset 72 maybe? Any help would be greatly appreciated, many thanks -- Milian Wolff m...@milianw.de http://milianw.de signature.asc Description: This is a digitally signed message part.
[tip:perf/urgent] perf report: Don't crash on invalid inline debug information
Commit-ID: d4046e8e17b9f378cb861982ef71c63911b5dff3 Gitweb: https://git.kernel.org/tip/d4046e8e17b9f378cb861982ef71c63911b5dff3 Author: Milian Wolff AuthorDate: Wed, 26 Sep 2018 15:52:07 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 16 Oct 2018 14:52:21 -0300 perf report: Don't crash on invalid inline debug information When the function name for an inline frame is invalid, we must not try to demangle this symbol, otherwise we crash with: #0 0x55895c01 in bfd_demangle () #1 0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 0x557fef4b in new_inline_sym (funcname=0x0, base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at util/srcline.c:264 #5 0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0, line=line@entry=0x0, dso=dso@entry=0x55c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, sym=0x55d92b90) at util/srcline.c:313 #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf") at util/srcline.c:358 So instead handle the case where we get invalid function names for inlined frames and use a fallback '??' function name instead. While this crash was originally reported by Hadrien for rust code, I can now also reproduce it with trivial C++ code. Indeed, it seems like libbfd fails to interpret the debug information for the inline frame symbol name: $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48 main /usr/include/c++/8.2.1/complex:610 ?? /usr/include/c++/8.2.1/complex:618 ?? /usr/include/c++/8.2.1/complex:675 ?? /usr/include/c++/8.2.1/complex:685 main /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 I've reported this bug upstream and also attached a patch there which should fix this issue: https://sourceware.org/bugzilla/show_bug.cgi?id=23715 Reported-by: Hadrien Grasland Signed-off-by: Milian Wolff Cc: Jin Yao Cc: Jiri Olsa Cc: Namhyung Kim Fixes: a64489c56c30 ("perf report: Find the inline stack for a given address") [ The above 'Fixes:' cset is where originally the problem was introduced, i.e. using a2l->funcname without checking if it is NULL, but this current patch fixes the current codebase, i.e. multiple csets were applied after a64489c56c30 before the problem was reported by Hadrien ] Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/srcline.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c index 09d6746e6ec8..e767c4a9d4d2 100644 --- a/tools/perf/util/srcline.c +++ b/tools/perf/util/srcline.c @@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso, struct symbol *inline_sym; char *demangled = NULL; + if (!funcname) + funcname = "??"; + if (dso) { demangled = dso__demangle_sym(dso, 0, funcname); if (demangled)
[tip:perf/urgent] perf report: Don't crash on invalid inline debug information
Commit-ID: d4046e8e17b9f378cb861982ef71c63911b5dff3 Gitweb: https://git.kernel.org/tip/d4046e8e17b9f378cb861982ef71c63911b5dff3 Author: Milian Wolff AuthorDate: Wed, 26 Sep 2018 15:52:07 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 16 Oct 2018 14:52:21 -0300 perf report: Don't crash on invalid inline debug information When the function name for an inline frame is invalid, we must not try to demangle this symbol, otherwise we crash with: #0 0x55895c01 in bfd_demangle () #1 0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 0x557fef4b in new_inline_sym (funcname=0x0, base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at util/srcline.c:264 #5 0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0, line=line@entry=0x0, dso=dso@entry=0x55c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, sym=0x55d92b90) at util/srcline.c:313 #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf") at util/srcline.c:358 So instead handle the case where we get invalid function names for inlined frames and use a fallback '??' function name instead. While this crash was originally reported by Hadrien for rust code, I can now also reproduce it with trivial C++ code. Indeed, it seems like libbfd fails to interpret the debug information for the inline frame symbol name: $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48 main /usr/include/c++/8.2.1/complex:610 ?? /usr/include/c++/8.2.1/complex:618 ?? /usr/include/c++/8.2.1/complex:675 ?? /usr/include/c++/8.2.1/complex:685 main /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 I've reported this bug upstream and also attached a patch there which should fix this issue: https://sourceware.org/bugzilla/show_bug.cgi?id=23715 Reported-by: Hadrien Grasland Signed-off-by: Milian Wolff Cc: Jin Yao Cc: Jiri Olsa Cc: Namhyung Kim Fixes: a64489c56c30 ("perf report: Find the inline stack for a given address") [ The above 'Fixes:' cset is where originally the problem was introduced, i.e. using a2l->funcname without checking if it is NULL, but this current patch fixes the current codebase, i.e. multiple csets were applied after a64489c56c30 before the problem was reported by Hadrien ] Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/srcline.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c index 09d6746e6ec8..e767c4a9d4d2 100644 --- a/tools/perf/util/srcline.c +++ b/tools/perf/util/srcline.c @@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso, struct symbol *inline_sym; char *demangled = NULL; + if (!funcname) + funcname = "??"; + if (dso) { demangled = dso__demangle_sym(dso, 0, funcname); if (demangled)
Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information
On Dienstag, 16. Oktober 2018 19:52:04 CEST Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 16, 2018 at 02:49:23PM -0300, Arnaldo Carvalho de Melo escreveu: > > Em Mon, Oct 15, 2018 at 10:51:36PM +0200, Milian Wolff escreveu: > > > On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo wrote: > > > > Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu: > > > > > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de > > > > > Melo > > > > > > > > > > wrote: > > > > > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu: > > > > > > > When the function name for an inline frame is invalid, we must > > > > > > > not try to demangle this symbol, otherwise we crash with: > > > > > > > > > > > > > > #0 0x55895c01 in bfd_demangle () > > > > > > > #1 0x55823262 in demangle_sym (dso=0x55d92b90, > > > > > > > elf_name=0x0, > > > > > > > kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym > > > > > > > (dso=dso@entry=0x55d92b90, kmodule=, > > > > > > > kmodule@entry=0, > > > > > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 > > > > > > > 0x557fef4b in new_inline_sym (funcname=0x0, > > > > > > > base_sym=0x55d92b90, dso=0x55d92b90) at > > > > > > > util/srcline.c:89 #4 > > > > > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, > > > > > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at > > > > > > > util/srcline.c:264 #5 0x557ff27f in addr2line > > > > > > > (dso_name=dso_name@entry=0x55d92430 > > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603 > > > > > > > 3d24fc > > > > > > > e5/ > > > > > > > elf", addr=addr@entry=2888, file=file@entry=0x0,> > > > > > > > > > > > > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00, > > > > > > > unwind_inlines=unwind_inlines@entry=true, > > > > > > > node=0x55e31810, > > > > > > > sym=0x55d92b90) at util/srcline.c:313> > > > > > > > > > > > > > > #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, > > > > > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 > > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603 > > > > > > > 3d24fc > > > > > > > e5/ > > > > > > > elf")> > > > > > > > > > > > > > > at util/srcline.c:358 > > > > > > > > > > > > > > So instead handle the case where we get invalid function names > > > > > > > for inlined frames and use a fallback '??' function name > > > > > > > instead. > > > > > > > > > > > > > > While this crash was originally reported by Hadrien for rust > > > > > > > code, > > > > > > > I can now also reproduce it with trivial C++ code. Indeed, it > > > > > > > seems > > > > > > > like libbfd fails to interpret the debug information for the > > > > > > > inline > > > > > > > frame symbol name: > > > > > > > > > > > > > > $ addr2line -e > > > > > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033 > > > > > > > d24fce > > > > > > > 5/e > > > > > > > lf -if b48 main > > > > > > > /usr/include/c++/8.2.1/complex:610 > > > > > > > ?? > > > > > > > /usr/include/c++/8.2.1/complex:618 > > > > > > > ?? > > > > > > > /usr/include/c++/8.2.1/complex:675 > > > > > > > ?? > > > > > > > /usr/include/c++/8.2.1/complex:685 > > > > > > > main > > > > > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-in > > > > > > > lining > > > > > > > /mai > > > > > > > n.cpp:39 > > > > &g
Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information
On Dienstag, 16. Oktober 2018 19:52:04 CEST Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 16, 2018 at 02:49:23PM -0300, Arnaldo Carvalho de Melo escreveu: > > Em Mon, Oct 15, 2018 at 10:51:36PM +0200, Milian Wolff escreveu: > > > On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo wrote: > > > > Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu: > > > > > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de > > > > > Melo > > > > > > > > > > wrote: > > > > > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu: > > > > > > > When the function name for an inline frame is invalid, we must > > > > > > > not try to demangle this symbol, otherwise we crash with: > > > > > > > > > > > > > > #0 0x55895c01 in bfd_demangle () > > > > > > > #1 0x55823262 in demangle_sym (dso=0x55d92b90, > > > > > > > elf_name=0x0, > > > > > > > kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym > > > > > > > (dso=dso@entry=0x55d92b90, kmodule=, > > > > > > > kmodule@entry=0, > > > > > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 > > > > > > > 0x557fef4b in new_inline_sym (funcname=0x0, > > > > > > > base_sym=0x55d92b90, dso=0x55d92b90) at > > > > > > > util/srcline.c:89 #4 > > > > > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, > > > > > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at > > > > > > > util/srcline.c:264 #5 0x557ff27f in addr2line > > > > > > > (dso_name=dso_name@entry=0x55d92430 > > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603 > > > > > > > 3d24fc > > > > > > > e5/ > > > > > > > elf", addr=addr@entry=2888, file=file@entry=0x0,> > > > > > > > > > > > > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00, > > > > > > > unwind_inlines=unwind_inlines@entry=true, > > > > > > > node=0x55e31810, > > > > > > > sym=0x55d92b90) at util/srcline.c:313> > > > > > > > > > > > > > > #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, > > > > > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 > > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603 > > > > > > > 3d24fc > > > > > > > e5/ > > > > > > > elf")> > > > > > > > > > > > > > > at util/srcline.c:358 > > > > > > > > > > > > > > So instead handle the case where we get invalid function names > > > > > > > for inlined frames and use a fallback '??' function name > > > > > > > instead. > > > > > > > > > > > > > > While this crash was originally reported by Hadrien for rust > > > > > > > code, > > > > > > > I can now also reproduce it with trivial C++ code. Indeed, it > > > > > > > seems > > > > > > > like libbfd fails to interpret the debug information for the > > > > > > > inline > > > > > > > frame symbol name: > > > > > > > > > > > > > > $ addr2line -e > > > > > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033 > > > > > > > d24fce > > > > > > > 5/e > > > > > > > lf -if b48 main > > > > > > > /usr/include/c++/8.2.1/complex:610 > > > > > > > ?? > > > > > > > /usr/include/c++/8.2.1/complex:618 > > > > > > > ?? > > > > > > > /usr/include/c++/8.2.1/complex:675 > > > > > > > ?? > > > > > > > /usr/include/c++/8.2.1/complex:685 > > > > > > > main > > > > > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-in > > > > > > > lining > > > > > > > /mai > > > > > > > n.cpp:39 > > > > &g
Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information
On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo wrote: > Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu: > > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo > > > > wrote: > > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu: > > > > When the function name for an inline frame is invalid, we must > > > > not try to demangle this symbol, otherwise we crash with: > > > > > > > > #0 0x55895c01 in bfd_demangle () > > > > #1 0x55823262 in demangle_sym (dso=0x55d92b90, > > > > elf_name=0x0, > > > > kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym > > > > (dso=dso@entry=0x55d92b90, kmodule=, > > > > kmodule@entry=0, > > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 > > > > 0x557fef4b in new_inline_sym (funcname=0x0, > > > > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 > > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, > > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at > > > > util/srcline.c:264 #5 0x557ff27f in addr2line > > > > (dso_name=dso_name@entry=0x55d92430 > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc > > > > e5/ > > > > elf", addr=addr@entry=2888, file=file@entry=0x0,> > > > > > > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00, > > > > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, > > > > sym=0x55d92b90) at util/srcline.c:313> > > > > > > > > #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, > > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc > > > > e5/ > > > > elf")> > > > > > > > > at util/srcline.c:358 > > > > > > > > So instead handle the case where we get invalid function names > > > > for inlined frames and use a fallback '??' function name instead. > > > > > > > > While this crash was originally reported by Hadrien for rust code, > > > > I can now also reproduce it with trivial C++ code. Indeed, it seems > > > > like libbfd fails to interpret the debug information for the inline > > > > frame symbol name: > > > > > > > > $ addr2line -e > > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce > > > > 5/e > > > > lf -if b48 main > > > > /usr/include/c++/8.2.1/complex:610 > > > > ?? > > > > /usr/include/c++/8.2.1/complex:618 > > > > ?? > > > > /usr/include/c++/8.2.1/complex:675 > > > > ?? > > > > /usr/include/c++/8.2.1/complex:685 > > > > main > > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining > > > > /mai > > > > n.cpp:39 > > > > > > > > I've reported this bug upstream and also attached a patch there > > > > which should fix this issue: > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=23715 > > > > > > Millian, what about this one, which is the cset it is fixing? > > > > Hey Arnaldo, > > > > just noticed this email and that the corresponding patch hasn't landed in > > perf/core yet. The patch set which introduced this is a64489c56c307 ("perf > > report: Find the inline stack for a given address"). Note that the code > > was > > introduced by this patch, but then subsequently touched and moved by > > follow up patches. So, is this the patch you want to see referenced? > > Otherwise, the latest patch which gets fixed is afaik: 7285cf3325b4a > > ("perf srcline: Show correct function name for srcline of callchains"). > > > > Can you please pick either of these patches and amend the commit message > > of my patch and push it to perf/urgent and perf/core? > > I'll reread all this later or tomorrow and continue, going AFK now. Ping? -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information
On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo wrote: > Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu: > > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo > > > > wrote: > > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu: > > > > When the function name for an inline frame is invalid, we must > > > > not try to demangle this symbol, otherwise we crash with: > > > > > > > > #0 0x55895c01 in bfd_demangle () > > > > #1 0x55823262 in demangle_sym (dso=0x55d92b90, > > > > elf_name=0x0, > > > > kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym > > > > (dso=dso@entry=0x55d92b90, kmodule=, > > > > kmodule@entry=0, > > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 > > > > 0x557fef4b in new_inline_sym (funcname=0x0, > > > > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 > > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, > > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at > > > > util/srcline.c:264 #5 0x557ff27f in addr2line > > > > (dso_name=dso_name@entry=0x55d92430 > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc > > > > e5/ > > > > elf", addr=addr@entry=2888, file=file@entry=0x0,> > > > > > > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00, > > > > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, > > > > sym=0x55d92b90) at util/srcline.c:313> > > > > > > > > #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, > > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc > > > > e5/ > > > > elf")> > > > > > > > > at util/srcline.c:358 > > > > > > > > So instead handle the case where we get invalid function names > > > > for inlined frames and use a fallback '??' function name instead. > > > > > > > > While this crash was originally reported by Hadrien for rust code, > > > > I can now also reproduce it with trivial C++ code. Indeed, it seems > > > > like libbfd fails to interpret the debug information for the inline > > > > frame symbol name: > > > > > > > > $ addr2line -e > > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce > > > > 5/e > > > > lf -if b48 main > > > > /usr/include/c++/8.2.1/complex:610 > > > > ?? > > > > /usr/include/c++/8.2.1/complex:618 > > > > ?? > > > > /usr/include/c++/8.2.1/complex:675 > > > > ?? > > > > /usr/include/c++/8.2.1/complex:685 > > > > main > > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining > > > > /mai > > > > n.cpp:39 > > > > > > > > I've reported this bug upstream and also attached a patch there > > > > which should fix this issue: > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=23715 > > > > > > Millian, what about this one, which is the cset it is fixing? > > > > Hey Arnaldo, > > > > just noticed this email and that the corresponding patch hasn't landed in > > perf/core yet. The patch set which introduced this is a64489c56c307 ("perf > > report: Find the inline stack for a given address"). Note that the code > > was > > introduced by this patch, but then subsequently touched and moved by > > follow up patches. So, is this the patch you want to see referenced? > > Otherwise, the latest patch which gets fixed is afaik: 7285cf3325b4a > > ("perf srcline: Show correct function name for srcline of callchains"). > > > > Can you please pick either of these patches and amend the commit message > > of my patch and push it to perf/urgent and perf/core? > > I'll reread all this later or tomorrow and continue, going AFK now. Ping? -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information
On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu: > > When the function name for an inline frame is invalid, we must > > not try to demangle this symbol, otherwise we crash with: > > > > #0 0x55895c01 in bfd_demangle () > > #1 0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, > > kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym > > (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0, > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 > > 0x557fef4b in new_inline_sym (funcname=0x0, > > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at > > util/srcline.c:264 #5 0x557ff27f in addr2line > > (dso_name=dso_name@entry=0x55d92430 > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/ > > elf", addr=addr@entry=2888, file=file@entry=0x0,> > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00, > > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, > > sym=0x55d92b90) at util/srcline.c:313> > > #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/ > > elf")> > > at util/srcline.c:358 > > > > So instead handle the case where we get invalid function names > > for inlined frames and use a fallback '??' function name instead. > > > > While this crash was originally reported by Hadrien for rust code, > > I can now also reproduce it with trivial C++ code. Indeed, it seems > > like libbfd fails to interpret the debug information for the inline > > frame symbol name: > > > > $ addr2line -e > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/e > > lf -if b48 main > > /usr/include/c++/8.2.1/complex:610 > > ?? > > /usr/include/c++/8.2.1/complex:618 > > ?? > > /usr/include/c++/8.2.1/complex:675 > > ?? > > /usr/include/c++/8.2.1/complex:685 > > main > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/mai > > n.cpp:39 > > > > I've reported this bug upstream and also attached a patch there > > which should fix this issue: > > https://sourceware.org/bugzilla/show_bug.cgi?id=23715 > > Millian, what about this one, which is the cset it is fixing? Hey Arnaldo, just noticed this email and that the corresponding patch hasn't landed in perf/core yet. The patch set which introduced this is a64489c56c307 ("perf report: Find the inline stack for a given address"). Note that the code was introduced by this patch, but then subsequently touched and moved by follow up patches. So, is this the patch you want to see referenced? Otherwise, the latest patch which gets fixed is afaik: 7285cf3325b4a ("perf srcline: Show correct function name for srcline of callchains"). Can you please pick either of these patches and amend the commit message of my patch and push it to perf/urgent and perf/core? Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information
On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu: > > When the function name for an inline frame is invalid, we must > > not try to demangle this symbol, otherwise we crash with: > > > > #0 0x55895c01 in bfd_demangle () > > #1 0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, > > kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym > > (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0, > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 > > 0x557fef4b in new_inline_sym (funcname=0x0, > > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at > > util/srcline.c:264 #5 0x557ff27f in addr2line > > (dso_name=dso_name@entry=0x55d92430 > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/ > > elf", addr=addr@entry=2888, file=file@entry=0x0,> > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00, > > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, > > sym=0x55d92b90) at util/srcline.c:313> > > #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/ > > elf")> > > at util/srcline.c:358 > > > > So instead handle the case where we get invalid function names > > for inlined frames and use a fallback '??' function name instead. > > > > While this crash was originally reported by Hadrien for rust code, > > I can now also reproduce it with trivial C++ code. Indeed, it seems > > like libbfd fails to interpret the debug information for the inline > > frame symbol name: > > > > $ addr2line -e > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/e > > lf -if b48 main > > /usr/include/c++/8.2.1/complex:610 > > ?? > > /usr/include/c++/8.2.1/complex:618 > > ?? > > /usr/include/c++/8.2.1/complex:675 > > ?? > > /usr/include/c++/8.2.1/complex:685 > > main > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/mai > > n.cpp:39 > > > > I've reported this bug upstream and also attached a patch there > > which should fix this issue: > > https://sourceware.org/bugzilla/show_bug.cgi?id=23715 > > Millian, what about this one, which is the cset it is fixing? Hey Arnaldo, just noticed this email and that the corresponding patch hasn't landed in perf/core yet. The patch set which introduced this is a64489c56c307 ("perf report: Find the inline stack for a given address"). Note that the code was introduced by this patch, but then subsequently touched and moved by follow up patches. So, is this the patch you want to see referenced? Otherwise, the latest patch which gets fixed is afaik: 7285cf3325b4a ("perf srcline: Show correct function name for srcline of callchains"). Can you please pick either of these patches and amend the commit message of my patch and push it to perf/urgent and perf/core? Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf record: use unmapped IP for inline callchain cursors
On Freitag, 5. Oktober 2018 15:48:31 CEST Arnaldo Carvalho de Melo wrote: > Em Wed, Oct 03, 2018 at 09:05:37AM +0530, Ravi Bangoria escreveu: > > LGTM. > > > > Tested-by: Ravi Bangoria > > So, I've added this as a 'git rebase -i' 'fixup', i.e. kept the commit > log message for the patch this patch fixes, and combined the two into > just one patch so that we don't pollute the bisect history, since this > hasn't made it yet to tip, and I also added Ravi's Tested-by, since this > tests both. Thanks a lot for the cleanup work Arnaldo. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf record: use unmapped IP for inline callchain cursors
On Freitag, 5. Oktober 2018 15:48:31 CEST Arnaldo Carvalho de Melo wrote: > Em Wed, Oct 03, 2018 at 09:05:37AM +0530, Ravi Bangoria escreveu: > > LGTM. > > > > Tested-by: Ravi Bangoria > > So, I've added this as a 'git rebase -i' 'fixup', i.e. kept the commit > log message for the patch this patch fixes, and combined the two into > just one patch so that we don't pollute the bisect history, since this > hasn't made it yet to tip, and I also added Ravi's Tested-by, since this > tests both. Thanks a lot for the cleanup work Arnaldo. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[tip:perf/urgent] perf record: Use unmapped IP for inline callchain cursors
Commit-ID: 7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf Gitweb: https://git.kernel.org/tip/7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf Author: Milian Wolff AuthorDate: Wed, 26 Sep 2018 15:52:06 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 5 Oct 2018 11:18:09 -0300 perf record: Use unmapped IP for inline callchain cursors Only use the mapped IP to find inline frames, but keep using the unmapped IP for the callchain cursor. This ensures we properly show the unmapped IP when displaying a frame we received via the dso__parse_addr_inlines API for a module which does not contain sufficient debug symbols to show the srcline. This is another follow-up to commit 19610184693c ("perf script: Show virtual addresses instead of offsets"). Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Tested-by: Ravi Bangoria Tested-by: Arnaldo Carvalho de Melo Cc: Jin Yao Cc: Namhyung Kim Cc: Sandipan Das Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets") Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wo...@kdab.com Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wo...@kdab.com [ Squashed a fix from Milian for a problem reported by Ravi, fixed up space damage ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0cb4f8bf3ca7..111ae858cbcb 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor, if (!symbol_conf.inline_name || !map || !sym) return ret; - addr = map__rip_2objdump(map, ip); + addr = map__map_ip(map, ip); + addr = map__rip_2objdump(map, addr); inline_node = inlines__tree_find(>dso->inlined_nodes, addr); if (!inline_node) {
[tip:perf/urgent] perf record: Use unmapped IP for inline callchain cursors
Commit-ID: 7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf Gitweb: https://git.kernel.org/tip/7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf Author: Milian Wolff AuthorDate: Wed, 26 Sep 2018 15:52:06 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 5 Oct 2018 11:18:09 -0300 perf record: Use unmapped IP for inline callchain cursors Only use the mapped IP to find inline frames, but keep using the unmapped IP for the callchain cursor. This ensures we properly show the unmapped IP when displaying a frame we received via the dso__parse_addr_inlines API for a module which does not contain sufficient debug symbols to show the srcline. This is another follow-up to commit 19610184693c ("perf script: Show virtual addresses instead of offsets"). Signed-off-by: Milian Wolff Acked-by: Jiri Olsa Tested-by: Ravi Bangoria Tested-by: Arnaldo Carvalho de Melo Cc: Jin Yao Cc: Namhyung Kim Cc: Sandipan Das Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets") Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wo...@kdab.com Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wo...@kdab.com [ Squashed a fix from Milian for a problem reported by Ravi, fixed up space damage ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0cb4f8bf3ca7..111ae858cbcb 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor, if (!symbol_conf.inline_name || !map || !sym) return ret; - addr = map__rip_2objdump(map, ip); + addr = map__map_ip(map, ip); + addr = map__rip_2objdump(map, addr); inline_node = inlines__tree_find(>dso->inlined_nodes, addr); if (!inline_node) {
[tip:perf/urgent] perf report: Don't try to map ip to invalid map
Commit-ID: ff4ce2885af8f9e8e99864d78dbeb4673f089c76 Gitweb: https://git.kernel.org/tip/ff4ce2885af8f9e8e99864d78dbeb4673f089c76 Author: Milian Wolff AuthorDate: Wed, 26 Sep 2018 15:52:05 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Thu, 27 Sep 2018 16:05:43 -0300 perf report: Don't try to map ip to invalid map Fixes a crash when the report encounters an address that could not be associated with an mmaped region: #0 0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329 #1 unwind_entry (entry=entry@entry=0x7fff9180, arg=arg@entry=0x75642498) at util/machine.c:2329 #2 0x558370af in entry (arg=0x75642498, cb=0x557bdb50 , thread=, ip=18446744073709551615) at util/unwind-libunwind-local.c:586 #3 get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 , arg=0x75642498, max_stack=) at util/unwind-libunwind-local.c:703 #4 0x55837192 in _unwind__get_entries (cb=, arg=, thread=, data=, max_stack=) at util/unwind-libunwind-local.c:725 #5 0x557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, thread=0x55c7f6f0) at util/machine.c:2351 #6 thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, root_al=0x7fff9750, max_stack=127) at util/machine.c:2378 #7 0x557ba4ee in sample__resolve_callchain (sample=, cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750, max_stack=) at util/callchain.c:1085 Signed-off-by: Milian Wolff Tested-by: Sandipan Das Acked-by: Jiri Olsa Cc: Jin Yao Cc: Namhyung Kim Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based unwinding") Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index c4acd2001db0..0cb4f8bf3ca7 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) { struct callchain_cursor *cursor = arg; const char *srcline = NULL; - u64 addr; + u64 addr = entry->ip; if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; @@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) * Convert entry->ip from a virtual address to an offset in * its corresponding binary. */ - addr = map__map_ip(entry->map, entry->ip); + if (entry->map) + addr = map__map_ip(entry->map, entry->ip); srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip,
[tip:perf/urgent] perf report: Don't try to map ip to invalid map
Commit-ID: ff4ce2885af8f9e8e99864d78dbeb4673f089c76 Gitweb: https://git.kernel.org/tip/ff4ce2885af8f9e8e99864d78dbeb4673f089c76 Author: Milian Wolff AuthorDate: Wed, 26 Sep 2018 15:52:05 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Thu, 27 Sep 2018 16:05:43 -0300 perf report: Don't try to map ip to invalid map Fixes a crash when the report encounters an address that could not be associated with an mmaped region: #0 0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329 #1 unwind_entry (entry=entry@entry=0x7fff9180, arg=arg@entry=0x75642498) at util/machine.c:2329 #2 0x558370af in entry (arg=0x75642498, cb=0x557bdb50 , thread=, ip=18446744073709551615) at util/unwind-libunwind-local.c:586 #3 get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 , arg=0x75642498, max_stack=) at util/unwind-libunwind-local.c:703 #4 0x55837192 in _unwind__get_entries (cb=, arg=, thread=, data=, max_stack=) at util/unwind-libunwind-local.c:725 #5 0x557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, thread=0x55c7f6f0) at util/machine.c:2351 #6 thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, root_al=0x7fff9750, max_stack=127) at util/machine.c:2378 #7 0x557ba4ee in sample__resolve_callchain (sample=, cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750, max_stack=) at util/callchain.c:1085 Signed-off-by: Milian Wolff Tested-by: Sandipan Das Acked-by: Jiri Olsa Cc: Jin Yao Cc: Namhyung Kim Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based unwinding") Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index c4acd2001db0..0cb4f8bf3ca7 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) { struct callchain_cursor *cursor = arg; const char *srcline = NULL; - u64 addr; + u64 addr = entry->ip; if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; @@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) * Convert entry->ip from a virtual address to an offset in * its corresponding binary. */ - addr = map__map_ip(entry->map, entry->ip); + if (entry->map) + addr = map__map_ip(entry->map, entry->ip); srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip,
[PATCH] perf record: use unmapped IP for inline callchain cursors
Only use the mapped IP to find inline frames, but keep using the unmapped IP for the callchain cursor. This ensures we properly show the unmapped IP when displaying a frame we received via the dso__parse_addr_inlines API for a module which does not contain sufficient debug symbols to show the srcline. Before: $ perf record -e cycles:u --call-graph ls $ perf script ... ls 12853 2735.563911: 43354 cycles:u: 17878 __GI___tunables_init+0x01d1d63a0118 (/usr/lib/ld-2.28.so) 19ee9 _dl_sysdep_start+0x01d1d63a02e9 (/usr/lib/ld-2.28.so) 3087 _dl_start+0x01d1d63a0287 (/usr/lib/ld-2.28.so) 2007 _start+0x01d1d63a0007 (/usr/lib/ld-2.28.so) After: $ perf script ... ls 12853 2735.563911: 43354 cycles:u: 7f1714e46878 __GI___tunables_init+0x118 (/usr/lib/ld-2.28.so) 7f1714e48ee9 _dl_sysdep_start+0x2e9 (/usr/lib/ld-2.28.so) 7f1714e32087 _dl_start+0x287 (/usr/lib/ld-2.28.so) 7f1714e31007 _start+0x7 (/usr/lib/ld-2.28.so) For frames with sufficient debug symbols, the behavior is still sane and works as expected in my tests. This patch series shows that we desperately need an automated test for inline frame resolution. I'll try to come up with something for the various regressions in the future. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Reported-by: Ravi Bangoria # Tested-by: # Reviewed-by: # Suggested-b: Fixes: bfe16b0653 ("perf report: Don't crash on invalid inline debug information") --- tools/perf/util/machine.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 73a651f10a0f..111ae858cbcb 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor, if (!symbol_conf.inline_name || !map || !sym) return ret; - addr = map__rip_2objdump(map, ip); + addr = map__map_ip(map, ip); + addr = map__rip_2objdump(map, addr); inline_node = inlines__tree_find(>dso->inlined_nodes, addr); if (!inline_node) { @@ -2317,6 +2318,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; + if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0) + return 0; + /* * Convert entry->ip from a virtual address to an offset in * its corresponding binary. @@ -2324,9 +2328,6 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (entry->map) addr = map__map_ip(entry->map, entry->ip); - if (append_inlines(cursor, entry->map, entry->sym, addr) == 0) - return 0; - srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, entry->map, entry->sym, -- 2.19.0
[PATCH] perf record: use unmapped IP for inline callchain cursors
Only use the mapped IP to find inline frames, but keep using the unmapped IP for the callchain cursor. This ensures we properly show the unmapped IP when displaying a frame we received via the dso__parse_addr_inlines API for a module which does not contain sufficient debug symbols to show the srcline. Before: $ perf record -e cycles:u --call-graph ls $ perf script ... ls 12853 2735.563911: 43354 cycles:u: 17878 __GI___tunables_init+0x01d1d63a0118 (/usr/lib/ld-2.28.so) 19ee9 _dl_sysdep_start+0x01d1d63a02e9 (/usr/lib/ld-2.28.so) 3087 _dl_start+0x01d1d63a0287 (/usr/lib/ld-2.28.so) 2007 _start+0x01d1d63a0007 (/usr/lib/ld-2.28.so) After: $ perf script ... ls 12853 2735.563911: 43354 cycles:u: 7f1714e46878 __GI___tunables_init+0x118 (/usr/lib/ld-2.28.so) 7f1714e48ee9 _dl_sysdep_start+0x2e9 (/usr/lib/ld-2.28.so) 7f1714e32087 _dl_start+0x287 (/usr/lib/ld-2.28.so) 7f1714e31007 _start+0x7 (/usr/lib/ld-2.28.so) For frames with sufficient debug symbols, the behavior is still sane and works as expected in my tests. This patch series shows that we desperately need an automated test for inline frame resolution. I'll try to come up with something for the various regressions in the future. Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Reported-by: Ravi Bangoria # Tested-by: # Reviewed-by: # Suggested-b: Fixes: bfe16b0653 ("perf report: Don't crash on invalid inline debug information") --- tools/perf/util/machine.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 73a651f10a0f..111ae858cbcb 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor, if (!symbol_conf.inline_name || !map || !sym) return ret; - addr = map__rip_2objdump(map, ip); + addr = map__map_ip(map, ip); + addr = map__rip_2objdump(map, addr); inline_node = inlines__tree_find(>dso->inlined_nodes, addr); if (!inline_node) { @@ -2317,6 +2318,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; + if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0) + return 0; + /* * Convert entry->ip from a virtual address to an offset in * its corresponding binary. @@ -2324,9 +2328,6 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (entry->map) addr = map__map_ip(entry->map, entry->ip); - if (append_inlines(cursor, entry->map, entry->sym, addr) == 0) - return 0; - srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, entry->map, entry->sym, -- 2.19.0
Re: [RFC 00/10] perf: Add cputime events/metrics
On Thursday, June 7, 2018 1:10:18 AM CEST Andi Kleen wrote: > > I had some issues with IDLE counter being miscounted due to stopping > > of the idle tick. I tried to solve it in this patch (it's part of the > > > > patchset): > > perf/cputime: Don't stop idle tick if there's live cputime event > > > > but I'm pretty sure it's wrong and there's better solution. > > At least on intel we already have hardware counters for different idle > states. You just would need to add them and convert to the same > unit. > > But of course it's still useful when this is not available. > > > My current plan is now to read those counters in perf top/record/report > > to show (at least) the idle percentage for the current profile. > > It's useful. Thanks for working on it. I was thinking about doing > something similar for some time. Hey Jiri, what happened to this patch series? I also believe it's super useful, even when it's not yet perfect. Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [RFC 00/10] perf: Add cputime events/metrics
On Thursday, June 7, 2018 1:10:18 AM CEST Andi Kleen wrote: > > I had some issues with IDLE counter being miscounted due to stopping > > of the idle tick. I tried to solve it in this patch (it's part of the > > > > patchset): > > perf/cputime: Don't stop idle tick if there's live cputime event > > > > but I'm pretty sure it's wrong and there's better solution. > > At least on intel we already have hardware counters for different idle > states. You just would need to add them and convert to the same > unit. > > But of course it's still useful when this is not available. > > > My current plan is now to read those counters in perf top/record/report > > to show (at least) the idle percentage for the current profile. > > It's useful. Thanks for working on it. I was thinking about doing > something similar for some time. Hey Jiri, what happened to this patch series? I also believe it's super useful, even when it's not yet perfect. Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 1/3] perf report: don't try to map ip to invalid map
On Wednesday, September 26, 2018 4:18:19 PM CEST Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 26, 2018 at 03:52:05PM +0200, Milian Wolff escreveu: > > Fixes a crash when the report encounters an address that > > > could not be associated with an mmaped region: > Milian, can you spot which cset introduced this problem? So that we can > add a "Fixes: sha" tag in this (and the others, if needed) to help the > stable kernel maintainers to find which kernels this has to be > backported to? The issue was introduced by perf script: Show correct offsets for DWARF-based unwinding This in turn got backported already a few times, at which point the 2a9d5050dc84fa2060f08a52f632976923e0fa7e sha was used when referencing the "Upstream commit". Is that enough, or do you need me to find all the backported shas too? -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 1/3] perf report: don't try to map ip to invalid map
On Wednesday, September 26, 2018 4:18:19 PM CEST Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 26, 2018 at 03:52:05PM +0200, Milian Wolff escreveu: > > Fixes a crash when the report encounters an address that > > > could not be associated with an mmaped region: > Milian, can you spot which cset introduced this problem? So that we can > add a "Fixes: sha" tag in this (and the others, if needed) to help the > stable kernel maintainers to find which kernels this has to be > backported to? The issue was introduced by perf script: Show correct offsets for DWARF-based unwinding This in turn got backported already a few times, at which point the 2a9d5050dc84fa2060f08a52f632976923e0fa7e sha was used when referencing the "Upstream commit". Is that enough, or do you need me to find all the backported shas too? -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
[PATCH 1/3] perf report: don't try to map ip to invalid map
Fixes a crash when the report encounters an address that could not be associated with an mmaped region: #0 0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329 #1 unwind_entry (entry=entry@entry=0x7fff9180, arg=arg@entry=0x75642498) at util/machine.c:2329 #2 0x558370af in entry (arg=0x75642498, cb=0x557bdb50 , thread=, ip=18446744073709551615) at util/unwind-libunwind-local.c:586 #3 get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 , arg=0x75642498, max_stack=) at util/unwind-libunwind-local.c:703 #4 0x55837192 in _unwind__get_entries (cb=, arg=, thread=, data=, max_stack=) at util/unwind-libunwind-local.c:725 #5 0x557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, thread=0x55c7f6f0) at util/machine.c:2351 #6 thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, root_al=0x7fff9750, max_stack=127) at util/machine.c:2378 #7 0x557ba4ee in sample__resolve_callchain (sample=, cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750, max_stack=) at util/callchain.c:1085 Signed-off-by: Milian Wolff Cc: Sandipan Das Cc: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index c4acd2001db0..0cb4f8bf3ca7 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) { struct callchain_cursor *cursor = arg; const char *srcline = NULL; - u64 addr; + u64 addr = entry->ip; if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; @@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) * Convert entry->ip from a virtual address to an offset in * its corresponding binary. */ - addr = map__map_ip(entry->map, entry->ip); + if (entry->map) + addr = map__map_ip(entry->map, entry->ip); srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, -- 2.19.0
[PATCH 1/3] perf report: don't try to map ip to invalid map
Fixes a crash when the report encounters an address that could not be associated with an mmaped region: #0 0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329 #1 unwind_entry (entry=entry@entry=0x7fff9180, arg=arg@entry=0x75642498) at util/machine.c:2329 #2 0x558370af in entry (arg=0x75642498, cb=0x557bdb50 , thread=, ip=18446744073709551615) at util/unwind-libunwind-local.c:586 #3 get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 , arg=0x75642498, max_stack=) at util/unwind-libunwind-local.c:703 #4 0x55837192 in _unwind__get_entries (cb=, arg=, thread=, data=, max_stack=) at util/unwind-libunwind-local.c:725 #5 0x557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, thread=0x55c7f6f0) at util/machine.c:2351 #6 thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, root_al=0x7fff9750, max_stack=127) at util/machine.c:2378 #7 0x557ba4ee in sample__resolve_callchain (sample=, cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750, max_stack=) at util/callchain.c:1085 Signed-off-by: Milian Wolff Cc: Sandipan Das Cc: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index c4acd2001db0..0cb4f8bf3ca7 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) { struct callchain_cursor *cursor = arg; const char *srcline = NULL; - u64 addr; + u64 addr = entry->ip; if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; @@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) * Convert entry->ip from a virtual address to an offset in * its corresponding binary. */ - addr = map__map_ip(entry->map, entry->ip); + if (entry->map) + addr = map__map_ip(entry->map, entry->ip); srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, -- 2.19.0
[PATCH 2/3] perf report: use the offset address to find inline frames
To correctly find inlined frames, we have to use the file offset instead of the virtual memory address. This was already fixed for displaying srcline information while displaying in commit 2a9d5050dc84fa20 ("perf script: Show correct offsets for DWARF-based unwinding"). We just need to use the same corrected address also when trying to find inline frames. This is another follow-up to commit 19610184693c ("perf script: Show virtual addresses instead of offsets"). Signed-off-by: Milian Wolff Cc: Sandipan Das Cc: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0cb4f8bf3ca7..73a651f10a0f 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2317,9 +2317,6 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; - if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0) - return 0; - /* * Convert entry->ip from a virtual address to an offset in * its corresponding binary. @@ -2327,6 +2324,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (entry->map) addr = map__map_ip(entry->map, entry->ip); + if (append_inlines(cursor, entry->map, entry->sym, addr) == 0) + return 0; + srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, entry->map, entry->sym, -- 2.19.0
[PATCH 3/3] perf report: don't crash on invalid inline debug information
When the function name for an inline frame is invalid, we must not try to demangle this symbol, otherwise we crash with: #0 0x55895c01 in bfd_demangle () #1 0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 0x557fef4b in new_inline_sym (funcname=0x0, base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at util/srcline.c:264 #5 0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0, line=line@entry=0x0, dso=dso@entry=0x55c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, sym=0x55d92b90) at util/srcline.c:313 #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf") at util/srcline.c:358 So instead handle the case where we get invalid function names for inlined frames and use a fallback '??' function name instead. While this crash was originally reported by Hadrien for rust code, I can now also reproduce it with trivial C++ code. Indeed, it seems like libbfd fails to interpret the debug information for the inline frame symbol name: $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48 main /usr/include/c++/8.2.1/complex:610 ?? /usr/include/c++/8.2.1/complex:618 ?? /usr/include/c++/8.2.1/complex:675 ?? /usr/include/c++/8.2.1/complex:685 main /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 I've reported this bug upstream and also attached a patch there which should fix this issue: https://sourceware.org/bugzilla/show_bug.cgi?id=23715 Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Reported-by: Hadrien Grasland --- tools/perf/util/srcline.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c index 09d6746e6ec8..e767c4a9d4d2 100644 --- a/tools/perf/util/srcline.c +++ b/tools/perf/util/srcline.c @@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso, struct symbol *inline_sym; char *demangled = NULL; + if (!funcname) + funcname = "??"; + if (dso) { demangled = dso__demangle_sym(dso, 0, funcname); if (demangled) -- 2.19.0
[PATCH 2/3] perf report: use the offset address to find inline frames
To correctly find inlined frames, we have to use the file offset instead of the virtual memory address. This was already fixed for displaying srcline information while displaying in commit 2a9d5050dc84fa20 ("perf script: Show correct offsets for DWARF-based unwinding"). We just need to use the same corrected address also when trying to find inline frames. This is another follow-up to commit 19610184693c ("perf script: Show virtual addresses instead of offsets"). Signed-off-by: Milian Wolff Cc: Sandipan Das Cc: Arnaldo Carvalho de Melo --- tools/perf/util/machine.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0cb4f8bf3ca7..73a651f10a0f 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2317,9 +2317,6 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (symbol_conf.hide_unresolved && entry->sym == NULL) return 0; - if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0) - return 0; - /* * Convert entry->ip from a virtual address to an offset in * its corresponding binary. @@ -2327,6 +2324,9 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) if (entry->map) addr = map__map_ip(entry->map, entry->ip); + if (append_inlines(cursor, entry->map, entry->sym, addr) == 0) + return 0; + srcline = callchain_srcline(entry->map, entry->sym, addr); return callchain_cursor_append(cursor, entry->ip, entry->map, entry->sym, -- 2.19.0
[PATCH 3/3] perf report: don't crash on invalid inline debug information
When the function name for an inline frame is invalid, we must not try to demangle this symbol, otherwise we crash with: #0 0x55895c01 in bfd_demangle () #1 0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215 #2 dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 0x557fef4b in new_inline_sym (funcname=0x0, base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at util/srcline.c:264 #5 0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0, line=line@entry=0x0, dso=dso@entry=0x55c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, sym=0x55d92b90) at util/srcline.c:313 #6 0x557ffe7c in addr2inlines (sym=0x55d92b90, dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf") at util/srcline.c:358 So instead handle the case where we get invalid function names for inlined frames and use a fallback '??' function name instead. While this crash was originally reported by Hadrien for rust code, I can now also reproduce it with trivial C++ code. Indeed, it seems like libbfd fails to interpret the debug information for the inline frame symbol name: $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48 main /usr/include/c++/8.2.1/complex:610 ?? /usr/include/c++/8.2.1/complex:618 ?? /usr/include/c++/8.2.1/complex:675 ?? /usr/include/c++/8.2.1/complex:685 main /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 I've reported this bug upstream and also attached a patch there which should fix this issue: https://sourceware.org/bugzilla/show_bug.cgi?id=23715 Signed-off-by: Milian Wolff Cc: Arnaldo Carvalho de Melo Reported-by: Hadrien Grasland --- tools/perf/util/srcline.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c index 09d6746e6ec8..e767c4a9d4d2 100644 --- a/tools/perf/util/srcline.c +++ b/tools/perf/util/srcline.c @@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso, struct symbol *inline_sym; char *demangled = NULL; + if (!funcname) + funcname = "??"; + if (dso) { demangled = dso__demangle_sym(dso, 0, funcname); if (demangled) -- 2.19.0
Re: [PATCH] perf script: Show correct offsets for DWARF-based unwinding
On Montag, 9. Juli 2018 16:25:07 CEST Jiri Olsa wrote: > On Tue, Jul 03, 2018 at 05:35:55PM +0530, Sandipan Das wrote: > > SNIP > > > After: > > # perf report --stdio --no-children -s sym,srcline -g address > > > > # Samples: 1 of event 'probe_libc:inet_pton' > > # Event count (approx.): 1 > > # > > # Overhead SymbolSource:Line > > # ... > > # > > > > 100.00% [.] __GI___inet_pton inet_pton.c > > > > ---gaih_inet.constprop.7 getaddrinfo.c:537 > > > > getaddrinfo getaddrinfo.c:2304 > > main ping.c:519 > > generic_start_main.isra.0 libc-start.c:308 > > __libc_start_main libc-start.c:102 > > > > ... > > > > # perf script -F comm,ip,sym,symoff,srcline,dso > > > > ping > > > > 7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so) > > > > inet_pton.c:68 > > > > 7fffb385fa53 gaih_inet.constprop.7+0xf43 > > (/usr/lib64/libc-2.26.so) > > > > getaddrinfo.c:537 > > > > 7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so) > > > > getaddrinfo.c:2304 > > > > 130782d6f main+0x3df (/usr/bin/ping) > > > > ping.c:519 > > > > 7fffb377369f generic_start_main.isra.0+0x13f > > (/usr/lib64/libc-2.26.so) > > > > libc-start.c:308 > > > > 7fffb3773897 __libc_start_main+0xb7 > > (/usr/lib64/libc-2.26.so) > > > > libc-start.c:102 > > > > Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip > > for entries") Signed-off-by: Sandipan Das > > looks good to me, Milian? > > Acked-by: Jiri Olsa Sorry for the delay, I was on vacation. The above looks somewhat strange to me - why is there no `(inlined)` suffix visible anymore? Also, I can't test this patch locally, since - even without this patch - inline frame resolution with perf seems to be completely broken for me. It doesn't seem to be a perf regression - going back in time doesn't resole this - but rather of its dependencies or even of the DWARF emitted by the compilers I have available to test... Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] perf script: Show correct offsets for DWARF-based unwinding
On Montag, 9. Juli 2018 16:25:07 CEST Jiri Olsa wrote: > On Tue, Jul 03, 2018 at 05:35:55PM +0530, Sandipan Das wrote: > > SNIP > > > After: > > # perf report --stdio --no-children -s sym,srcline -g address > > > > # Samples: 1 of event 'probe_libc:inet_pton' > > # Event count (approx.): 1 > > # > > # Overhead SymbolSource:Line > > # ... > > # > > > > 100.00% [.] __GI___inet_pton inet_pton.c > > > > ---gaih_inet.constprop.7 getaddrinfo.c:537 > > > > getaddrinfo getaddrinfo.c:2304 > > main ping.c:519 > > generic_start_main.isra.0 libc-start.c:308 > > __libc_start_main libc-start.c:102 > > > > ... > > > > # perf script -F comm,ip,sym,symoff,srcline,dso > > > > ping > > > > 7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so) > > > > inet_pton.c:68 > > > > 7fffb385fa53 gaih_inet.constprop.7+0xf43 > > (/usr/lib64/libc-2.26.so) > > > > getaddrinfo.c:537 > > > > 7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so) > > > > getaddrinfo.c:2304 > > > > 130782d6f main+0x3df (/usr/bin/ping) > > > > ping.c:519 > > > > 7fffb377369f generic_start_main.isra.0+0x13f > > (/usr/lib64/libc-2.26.so) > > > > libc-start.c:308 > > > > 7fffb3773897 __libc_start_main+0xb7 > > (/usr/lib64/libc-2.26.so) > > > > libc-start.c:102 > > > > Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip > > for entries") Signed-off-by: Sandipan Das > > looks good to me, Milian? > > Acked-by: Jiri Olsa Sorry for the delay, I was on vacation. The above looks somewhat strange to me - why is there no `(inlined)` suffix visible anymore? Also, I can't test this patch locally, since - even without this patch - inline frame resolution with perf seems to be completely broken for me. It doesn't seem to be a perf regression - going back in time doesn't resole this - but rather of its dependencies or even of the DWARF emitted by the compilers I have available to test... Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] perf/core: exposing type of context-switch-out event
On Donnerstag, 1. März 2018 20:36:59 CET Andi Kleen wrote: > > Please also add documentation Documentation/perf.data-file-format.txt, but > > I just noticed that not even PERF_RECORD_SWITCH is documented there... > > That file only covers fields not generated by the kernel, but this > is coming from the kernel. > > Kernel records are documented in the manpage, but Vince usually updates > that on his own. Ah, TIL - thanks for that tip! But I still think it would be good to have a complete documentation of the perf.data file format in one place. I guess patches would be welcome to add more aspects of the file format there, even if it's generated by the kernel? That helps for thirdparty tools that parse the perf.data files (like perfparser used by QtCreator and hotspot). Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] perf/core: exposing type of context-switch-out event
On Donnerstag, 1. März 2018 20:36:59 CET Andi Kleen wrote: > > Please also add documentation Documentation/perf.data-file-format.txt, but > > I just noticed that not even PERF_RECORD_SWITCH is documented there... > > That file only covers fields not generated by the kernel, but this > is coming from the kernel. > > Kernel records are documented in the manpage, but Vince usually updates > that on his own. Ah, TIL - thanks for that tip! But I still think it would be good to have a complete documentation of the perf.data file format in one place. I guess patches would be welcome to add more aspects of the file format there, even if it's generated by the kernel? That helps for thirdparty tools that parse the perf.data files (like perfparser used by QtCreator and hotspot). Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] perf/core: exposing type of context-switch-out event
On Donnerstag, 1. März 2018 19:08:05 CET Andi Kleen wrote: > On Thu, Mar 01, 2018 at 06:40:04PM +0300, Alexey Budankov wrote: > > Hi, > > > > This patch prototypes exposing the type of context-switch-out event using > > PERF_RECORD_MISC_EXT_RESERVED bit for PERF_RECORD_SWITCH[_CPU_WIDE] > > records. > It would be better to define an actually named bit in perf_event.h. > It can be the same value. > > Also we would need a patch for perf script / perf report -D to print this > information. > > The rest looks good to me. Please also add documentation Documentation/perf.data-file-format.txt, but I just noticed that not even PERF_RECORD_SWITCH is documented there... Otherwise I also think that this would be a very nice feature addition! -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] perf/core: exposing type of context-switch-out event
On Donnerstag, 1. März 2018 19:08:05 CET Andi Kleen wrote: > On Thu, Mar 01, 2018 at 06:40:04PM +0300, Alexey Budankov wrote: > > Hi, > > > > This patch prototypes exposing the type of context-switch-out event using > > PERF_RECORD_MISC_EXT_RESERVED bit for PERF_RECORD_SWITCH[_CPU_WIDE] > > records. > It would be better to define an actually named bit in perf_event.h. > It can be the same value. > > Also we would need a patch for perf script / perf report -D to print this > information. > > The rest looks good to me. Please also add documentation Documentation/perf.data-file-format.txt, but I just noticed that not even PERF_RECORD_SWITCH is documented there... Otherwise I also think that this would be a very nice feature addition! -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint
On Tuesday, January 16, 2018 1:40:38 PM CET Cheng Jian wrote: > When i use perf to trace the sched_wakeup_new tracepoint, there is > a bug that output the same event repetitiously. > It can be reproduced by : > > #./test_fork > parent pid : 1059 > child pid : 1060 > #perf record -e sched:sched_wakeup_new -p 1060 > > test_fork is an demo that can generating wakeup_new event, parent > process does nothing but fork a child process, and then they both > quit. > > There are 4 processors in this machine. before this patch, > perf script(perf-1058, parent-1059, child-1060) : > > test_fork 1059 [001]62.913689: sched:sched_wakeup_new: > comm=test_fork pid=1060 prio=120 target_cpu=002 test_fork 1059 [001] > 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 > target_cpu=002 test_fork 1059 [001]62.913705: sched:sched_wakeup_new: > comm=test_fork pid=1060 prio=120 target_cpu=002 > > but ftrace report this event only once : > > test_fork-1059 [002] d... 62.913680: sched_wakeup_new: comm=test_fork > pid=1060 prio=120 target_cpu=002 > > perf script print the same wakeup_new event multiple times. > > These events which trigger this issue all specify a target process. > commit e6dab5ffab59 ("perf/trace: Add ability to set a target task > for events") has designed a method to trace these events. For > example, the sched_wakeup and sched_wakeup_new tracepoint will be > caught when the current task wakeup a target task. > > These events are registered as per cpu most of the time and attached > to the task too, we will get all of them from the perf_event_context > of this task, they will be matched success but are all the same event. > So check the cpu number of this event to avoid matching them multiple > times. > > after this patch, perf script(parent-1040, child-1041): > > test_fork 1040 [002]36.536079: sched:sched_wakeup_new: > comm=test_fork > pid=1041 prio=120 target_cpu=003 > > It will match it only once for tracing task(child-1041). Oh, this sounds awesome. I don't have the setup available to compile a kernel with this patch applied, but I think from the description it solves a long- standing issue with perf's sleep-time profiling. Can someone try this please: https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times Use 'sleep 1' as the debuggee. On my system, I get the period multiplied by nproc like you describe: ``` $ perf-sleep-record sleep 1 .. $ perf report --stdio --show-total-period | grep "Event count" .. # Event count (approx.): 8000845488 $ nproc 8 ``` The sleep-record script is available at: https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record I believe your patch also fixes the sched_stat_* tracepoints to be only emitted once per CPU. Can you verify this? I.e. is the period finally correctly calculated and we get a value of roughly 1E9ns == 1s? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint
On Tuesday, January 16, 2018 1:40:38 PM CET Cheng Jian wrote: > When i use perf to trace the sched_wakeup_new tracepoint, there is > a bug that output the same event repetitiously. > It can be reproduced by : > > #./test_fork > parent pid : 1059 > child pid : 1060 > #perf record -e sched:sched_wakeup_new -p 1060 > > test_fork is an demo that can generating wakeup_new event, parent > process does nothing but fork a child process, and then they both > quit. > > There are 4 processors in this machine. before this patch, > perf script(perf-1058, parent-1059, child-1060) : > > test_fork 1059 [001]62.913689: sched:sched_wakeup_new: > comm=test_fork pid=1060 prio=120 target_cpu=002 test_fork 1059 [001] > 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120 > target_cpu=002 test_fork 1059 [001]62.913705: sched:sched_wakeup_new: > comm=test_fork pid=1060 prio=120 target_cpu=002 > > but ftrace report this event only once : > > test_fork-1059 [002] d... 62.913680: sched_wakeup_new: comm=test_fork > pid=1060 prio=120 target_cpu=002 > > perf script print the same wakeup_new event multiple times. > > These events which trigger this issue all specify a target process. > commit e6dab5ffab59 ("perf/trace: Add ability to set a target task > for events") has designed a method to trace these events. For > example, the sched_wakeup and sched_wakeup_new tracepoint will be > caught when the current task wakeup a target task. > > These events are registered as per cpu most of the time and attached > to the task too, we will get all of them from the perf_event_context > of this task, they will be matched success but are all the same event. > So check the cpu number of this event to avoid matching them multiple > times. > > after this patch, perf script(parent-1040, child-1041): > > test_fork 1040 [002]36.536079: sched:sched_wakeup_new: > comm=test_fork > pid=1041 prio=120 target_cpu=003 > > It will match it only once for tracing task(child-1041). Oh, this sounds awesome. I don't have the setup available to compile a kernel with this patch applied, but I think from the description it solves a long- standing issue with perf's sleep-time profiling. Can someone try this please: https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times Use 'sleep 1' as the debuggee. On my system, I get the period multiplied by nproc like you describe: ``` $ perf-sleep-record sleep 1 .. $ perf report --stdio --show-total-period | grep "Event count" .. # Event count (approx.): 8000845488 $ nproc 8 ``` The sleep-record script is available at: https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record I believe your patch also fixes the sched_stat_* tracepoints to be only emitted once per CPU. Can you verify this? I.e. is the period finally correctly calculated and we get a value of roughly 1E9ns == 1s? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH AUTOSEL for 4.14 18/51] perf callchain: Compare symbol name for inlined frames when matching
On Wednesday, November 22, 2017 11:25:40 PM CET alexander.le...@verizon.com wrote: > From: Milian Wolff <milian.wo...@kdab.com> > > [ Upstream commit 9856240ad3269f2fdab0b2fa4400ef8aab792061 ] Hello Alexander, this is the first time I encounter AUTOSEL. I just want to check: The patch below depends on others in a whole series that reworks the handling of inline frames. Why is only this one getting selected? I don't even think it can work stand-alone? Thanks > The fake symbols we create for inlined frames will represent different > functions but can use the symbol start address. This leads to issues > when different inline branches all lead to the same function. > > Before: > ~ > $ perf report -s sym -i perf.inlining.data --inline --stdio -g function > ... > --38.86%--_start >__libc_start_main >main > > --37.57%--std::norm (inlined) > std::_Norm_helper::_S_do_it > (inlined) > >--36.36%--std::abs (inlined) > std::__complex_abs (inlined) > > > --12.24%--std::linear_congruential_engine 2147483647ul>::operator() (inlined) std::__detail::__mod 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~ > > Note that this backtrace representation is completely bogus. > Complex abs does not call the linear congruential engine! It > is just a side-effect of a longer inlined stack being appended > to a shorter, different inlined stack, both of which originate > in the same function (main). > > This patch fixes the issue: > > ~ > $ perf report -s sym -i perf.inlining.data --inline --stdio -g function > ... > --38.86%--_start >__libc_start_main >main > >|--35.59%--std::uniform_real_distribution::op >|erator()<std::linear_congruential_engine|long, 16807ul, 0ul, 2147483647ul> > (inlined) > | >| std::uniform_real_distribution::op >| erator()<std::linear_congruential_engine| nsigned long, 16807ul, 0ul, 2147483647ul> >| > (inlined) | > >| --34.37%--std::__detail::_Adaptor| near_congruential_engine| 16807ul, 0ul, 2147483647ul>, >| double>::operator() (inlined) > | >| std::generate_canonical<double, >| 53ul, >| std::linear_congruential_engin >| e| 2147483647ul> > (inlined) > | >| --12.24%--std::linear_congruen >| tial_engine| 16807ul, 0ul, >| 2147483647ul>::operator() >| (inlined) | >|std::__detail::__mod >||2147483647ul, >|16807ul, 0ul> >|(inlined) >|std::__detail::_Mod< >|unsigned long, >|2147483647ul, >|16807ul, 0ul, true, >|true>::__calc >| (inlined) > --1.99%--std::norm (inlined) > std::_Norm_helper::_S_do_it > (inlined) std::abs (inlined) > std::__complex_abs (inlined) > ~ > > Signed-off-by: Milian Wolff <milian.wo...@kdab.com> > Reviewed-by: Jiri Olsa <jo...@redhat.com> > Reviewed-by
Re: [PATCH AUTOSEL for 4.14 18/51] perf callchain: Compare symbol name for inlined frames when matching
On Wednesday, November 22, 2017 11:25:40 PM CET alexander.le...@verizon.com wrote: > From: Milian Wolff > > [ Upstream commit 9856240ad3269f2fdab0b2fa4400ef8aab792061 ] Hello Alexander, this is the first time I encounter AUTOSEL. I just want to check: The patch below depends on others in a whole series that reworks the handling of inline frames. Why is only this one getting selected? I don't even think it can work stand-alone? Thanks > The fake symbols we create for inlined frames will represent different > functions but can use the symbol start address. This leads to issues > when different inline branches all lead to the same function. > > Before: > ~ > $ perf report -s sym -i perf.inlining.data --inline --stdio -g function > ... > --38.86%--_start >__libc_start_main >main > > --37.57%--std::norm (inlined) > std::_Norm_helper::_S_do_it > (inlined) > >--36.36%--std::abs (inlined) > std::__complex_abs (inlined) > > > --12.24%--std::linear_congruential_engine 2147483647ul>::operator() (inlined) std::__detail::__mod 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~ > > Note that this backtrace representation is completely bogus. > Complex abs does not call the linear congruential engine! It > is just a side-effect of a longer inlined stack being appended > to a shorter, different inlined stack, both of which originate > in the same function (main). > > This patch fixes the issue: > > ~ > $ perf report -s sym -i perf.inlining.data --inline --stdio -g function > ... > --38.86%--_start >__libc_start_main >main > >|--35.59%--std::uniform_real_distribution::op >|erator()|long, 16807ul, 0ul, 2147483647ul> > (inlined) > | >| std::uniform_real_distribution::op >| erator()| nsigned long, 16807ul, 0ul, 2147483647ul> >| > (inlined) | > >| --34.37%--std::__detail::_Adaptor| near_congruential_engine| 16807ul, 0ul, 2147483647ul>, >| double>::operator() (inlined) > | >| std::generate_canonical| 53ul, >| std::linear_congruential_engin >| e| 2147483647ul> > (inlined) > | >| --12.24%--std::linear_congruen >| tial_engine| 16807ul, 0ul, >| 2147483647ul>::operator() >| (inlined) | >|std::__detail::__mod >||2147483647ul, >|16807ul, 0ul> >|(inlined) >|std::__detail::_Mod< >|unsigned long, >|2147483647ul, >|16807ul, 0ul, true, >|true>::__calc >|(inlined) > --1.99%--std::norm (inlined) > std::_Norm_helper::_S_do_it > (inlined) std::abs (inlined) > std::__complex_abs (inlined) > ~ > > Signed-off-by: Milian Wolff > Reviewed-by: Jiri Olsa > Reviewed-by: Namhyung Kim > Cc: David Ahern > Cc: Peter Zijlstra > Cc: Ravi Bangoria > Cc: Yao Jin > Link: http://lkml.kernel.org/r/20171009203310.17362-10-m
Re: [RFC] perf script: modify field selection option
On Montag, 20. November 2017 21:53:04 CET Stephane Eranian wrote: > Hi, > > I have been using the perf script -F option on the latest perf and I > find it not very convenient to use. I appreciate the + and - prefix to > field names to add or suppress them. But most of the time, I want to > print only one or two fields and I have to guess which ones are there > by default so I can suppress them. I think there should be a way to > say: start from no fields. I understand why you have default to > maintain compatibility with older perf script but I would like a > syntax to say: remove defaults. For instance: > > $ perf script -F --,+ip,+syms . > > Where -- would mean drop all defaults. > > Any better suggestions? Isn't `perf script -F ip,sym` what you want? Note the lack of any '+': $ perf script -F ip,sym | head -n 5 206aad x86_pmu_enable 380591 ctx_resched 380b46 __perf_event_enable 378716 event_function Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts
Re: [RFC] perf script: modify field selection option
On Montag, 20. November 2017 21:53:04 CET Stephane Eranian wrote: > Hi, > > I have been using the perf script -F option on the latest perf and I > find it not very convenient to use. I appreciate the + and - prefix to > field names to add or suppress them. But most of the time, I want to > print only one or two fields and I have to guess which ones are there > by default so I can suppress them. I think there should be a way to > say: start from no fields. I understand why you have default to > maintain compatibility with older perf script but I would like a > syntax to say: remove defaults. For instance: > > $ perf script -F --,+ip,+syms . > > Where -- would mean drop all defaults. > > Any better suggestions? Isn't `perf script -F ip,sym` what you want? Note the lack of any '+': $ perf script -F ip,sym | head -n 5 206aad x86_pmu_enable 380591 ctx_resched 380b46 __perf_event_enable 378716 event_function Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts
Re: [GIT PULL 00/15] perf/core inlining improvements
On Mittwoch, 25. Oktober 2017 17:59:58 CEST Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, this is Milian's v7 plus some fixes > acked by Namhyung after some discussion among the three of us, I > probably need to pick some more patches that are related to this area, > but lets make some progress and merge this kit, Thanks a lot for everyone involved in reviewing this series. Much appreciated! Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts
Re: [GIT PULL 00/15] perf/core inlining improvements
On Mittwoch, 25. Oktober 2017 17:59:58 CEST Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, this is Milian's v7 plus some fixes > acked by Namhyung after some discussion among the three of us, I > probably need to pick some more patches that are related to this area, > but lets make some progress and merge this kit, Thanks a lot for everyone involved in reviewing this series. Much appreciated! Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts
[tip:perf/core] perf util: Enable handling of inlined frames by default
Commit-ID: d8a88dd243a170a226aba33e7c53704db2f82aa6 Gitweb: https://git.kernel.org/tip/d8a88dd243a170a226aba33e7c53704db2f82aa6 Author: Milian Wolff <milian.wo...@kdab.com> AuthorDate: Thu, 19 Oct 2017 13:38:36 +0200 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 25 Oct 2017 10:50:47 -0300 perf util: Enable handling of inlined frames by default Now that we have caches in place to speed up the process of finding inlined frames and srcline information repeatedly, we can enable this useful option by default. Suggested-by: Ingo Molnar <mi...@kernel.org> Signed-off-by: Milian Wolff <milian.wo...@kdab.com> Reviewed-by: Andi Kleen <a...@linux.intel.com> Cc: David Ahern <dsah...@gmail.com> Cc: Jin Yao <yao@linux.intel.com> Cc: Jiri Olsa <jo...@kernel.org> Cc: Namhyung Kim <namhy...@kernel.org> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wo...@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf-report.txt | 3 ++- tools/perf/Documentation/perf-script.txt | 3 ++- tools/perf/util/symbol.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 383a98d..ddde2b5 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -434,7 +434,8 @@ include::itrace.txt[] --inline:: If a callgraph address belongs to an inlined function, the inline stack - will be printed. Each entry is function name or file/line. + will be printed. Each entry is function name or file/line. Enabled by + default, disable with --no-inline. include::callchain-overhead-calculation.txt[] diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index bcc1ba3..25e6773 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -327,7 +327,8 @@ include::itrace.txt[] --inline:: If a callgraph address belongs to an inlined function, the inline stack - will be printed. Each entry has function name and file/line. + will be printed. Each entry has function name and file/line. Enabled by + default, disable with --no-inline. SEE ALSO diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 066e38a..ce6993b 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = { .show_hist_headers = true, .symfs = "", .event_group= true, + .inline_name= true, }; static enum dso_binary_type binary_type_symtab[] = {