Re: [PATCH] perf script: Fix LBR skid dump problems in brstackinsn

2018-12-11 Thread Milian Wolff
On Donnerstag, 6. Dezember 2018 23:52:07 CET Andi Kleen wrote:
> On Thu, Dec 06, 2018 at 06:29:20PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Dec 06, 2018 at 12:51:48PM -0800, Andi Kleen escreveu:
> > > On Thu, Dec 06, 2018 at 02:01:40PM -0300, Arnaldo Carvalho de Melo 
wrote:
> > > > Em Mon, Nov 19, 2018 at 09:06:17PM -0800, Andi Kleen escreveu:
> > > > > From: Andi Kleen 
> > > > > 
> > > > > This is a fix for another instance of the skid problem Milian
> > > > > recently found [1]
> > 
> > I think you forgot to add the reference, i.e. what is the url or
> > message-id that this [1] refers to?
> 
> Hmm, I thought I saw some patches from Milian for this earlier,
> but now I can't find them. Perhaps I misremember. Milian
> can point to them if they exist and are not just a figment
> of my imagination :-)

I only have very early POC patches, cf.: https://lkml.org/lkml/2018/11/14/608

I've now also pushed that on my WIP branch: https://github.com/milianw/linux/
tree/pebs-callchain-breakage

I haven't had the time since to work on this. The patches as-is are not 
upstreamable. There are some open questions on my side (see mail referenced 
above).

> These were the changes to report the stack frame RIP/RSP in the PEBS
> handler and use it for unwinding in perf.

Yes, I was looking at something different. I've no experience with brstackinsn 
usage in perf, so I can't really add my tested-by.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[tip:perf/core] perf script: Share code and output format for uregs and iregs output

2018-11-21 Thread tip-bot for Milian Wolff
Commit-ID:  9add8fe8e6f63db47e40e65173530dcb68cd7a07
Gitweb: https://git.kernel.org/tip/9add8fe8e6f63db47e40e65173530dcb68cd7a07
Author: Milian Wolff 
AuthorDate: Wed, 7 Nov 2018 23:34:37 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 21 Nov 2018 12:00:32 -0300

perf script: Share code and output format for uregs and iregs output

The iregs output was missing the newline at end as well as the leading
ABI output. This made it hard to compare the iregs and uregs values.
Instead, use a single function to output the register values and use it
for both, iregs and uregs, to ensure the output is consistent.

Before:

  perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286
BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1  ABI:2AX:0xffda
BX:0xCX:0x7f84ad85798bDX:0x560209699d50
SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0
SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b
R8:0x7ffe2c7a2030R9:0x7f84ae55f010   R10:0x8   R11:0x206   
R12:0x   R13:0x   R14:0x   
R15:0x

  perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...

After:

  perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286  
  BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1
ABI:2AX:0xffdaBX:0x
CX:0x7f84ad85798bDX:0x560209699d50SI:0x7ffe2c7a6820
DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058
IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2bR8:0x7ffe2c7a2030
R9:0x7f84ae55f010   R10:0x8   R11:0x206   R12:0x   
R13:0x   R14:0x   R15:0x

  perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20181107223437.9071-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 40 +---
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index daf73832743e

[tip:perf/core] perf script: Share code and output format for uregs and iregs output

2018-11-21 Thread tip-bot for Milian Wolff
Commit-ID:  9add8fe8e6f63db47e40e65173530dcb68cd7a07
Gitweb: https://git.kernel.org/tip/9add8fe8e6f63db47e40e65173530dcb68cd7a07
Author: Milian Wolff 
AuthorDate: Wed, 7 Nov 2018 23:34:37 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 21 Nov 2018 12:00:32 -0300

perf script: Share code and output format for uregs and iregs output

The iregs output was missing the newline at end as well as the leading
ABI output. This made it hard to compare the iregs and uregs values.
Instead, use a single function to output the register values and use it
for both, iregs and uregs, to ensure the output is consistent.

Before:

  perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286
BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1  ABI:2AX:0xffda
BX:0xCX:0x7f84ad85798bDX:0x560209699d50
SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0
SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b
R8:0x7ffe2c7a2030R9:0x7f84ae55f010   R10:0x8   R11:0x206   
R12:0x   R13:0x   R14:0x   
R15:0x

  perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...

After:

  perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286  
  BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1
ABI:2AX:0xffdaBX:0x
CX:0x7f84ad85798bDX:0x560209699d50SI:0x7ffe2c7a6820
DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058
IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2bR8:0x7ffe2c7a2030
R9:0x7f84ae55f010   R10:0x8   R11:0x206   R12:0x   
R13:0x   R14:0x   R15:0x

  perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20181107223437.9071-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 40 +---
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index daf73832743e

[tip:perf/core] perf script: Add newline after uregs output

2018-11-21 Thread tip-bot for Milian Wolff
Commit-ID:  b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14
Gitweb: https://git.kernel.org/tip/b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14
Author: Milian Wolff 
AuthorDate: Wed, 7 Nov 2018 10:37:05 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 21 Nov 2018 12:00:31 -0300

perf script: Add newline after uregs output

This change makes it much easier to easily distinguish between
consecutive samples by keeping the empty line between them, like we see
when we do not enable uregs output.

Before:

  cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
  77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...
  cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
  77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...

After:

  cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
  77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...

  cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
  77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...

Signed-off-by: Milian Wolff 
Cc: Jiri Olsa 
Link: http://lkml.kernel.org/r/20181107093705.16346-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index b5bc85bd0bbe..daf73832743e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample 
*sample,
printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), 
val);
}
 
+   fprintf(fp, "\n");
+
return printed;
 }
 


[tip:perf/core] perf script: Add newline after uregs output

2018-11-21 Thread tip-bot for Milian Wolff
Commit-ID:  b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14
Gitweb: https://git.kernel.org/tip/b07d16f7e9e4cf2562f61b5f68a4b0831fe5ef14
Author: Milian Wolff 
AuthorDate: Wed, 7 Nov 2018 10:37:05 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 21 Nov 2018 12:00:31 -0300

perf script: Add newline after uregs output

This change makes it much easier to easily distinguish between
consecutive samples by keeping the empty line between them, like we see
when we do not enable uregs output.

Before:

  cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
  77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...
  cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
  77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...

After:

  cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
  77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...

  cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
  77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
  ...
   ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...

Signed-off-by: Milian Wolff 
Cc: Jiri Olsa 
Link: http://lkml.kernel.org/r/20181107093705.16346-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index b5bc85bd0bbe..daf73832743e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample 
*sample,
printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), 
val);
}
 
+   fprintf(fp, "\n");
+
return printed;
 }
 


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-15 Thread Milian Wolff
On Donnerstag, 15. November 2018 03:05:32 CET Travis Downs wrote:
> On Wed, Nov 14, 2018 at 8:20 AM Milian Wolff  wrote:
> > 3) I suggest we always keep the first frame and sample IP from the user
> > regs, i.e. the accurate PEBS/IBS IP. Then we add the following frames
> > from unwinding the ustack with the iregs.
> 
> Does this mean that the displayed unwind will sometimes be
> "impossible" to have actually be generated from a consistent execution
> of the user program?

Yes, that is exactly what I'm saying.

> For example, the top frame (from PEBS) and second frame (from iregs)
> may be inconsistent in that the latter function never calls the first.
> At this point it would be good to have an indication at the top frame
> is from a different source than the rest of the frames, lest the user
> pull is hair out trying to determine how function X seems to call
> function Y despite that not being the case in the source.

I agree. I personally like your suggested approach - only add an indication 
when the IP differs so much that it points to a different function. What do 
others say to this?

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-15 Thread Milian Wolff
On Donnerstag, 15. November 2018 03:05:32 CET Travis Downs wrote:
> On Wed, Nov 14, 2018 at 8:20 AM Milian Wolff  wrote:
> > 3) I suggest we always keep the first frame and sample IP from the user
> > regs, i.e. the accurate PEBS/IBS IP. Then we add the following frames
> > from unwinding the ustack with the iregs.
> 
> Does this mean that the displayed unwind will sometimes be
> "impossible" to have actually be generated from a consistent execution
> of the user program?

Yes, that is exactly what I'm saying.

> For example, the top frame (from PEBS) and second frame (from iregs)
> may be inconsistent in that the latter function never calls the first.
> At this point it would be good to have an indication at the top frame
> is from a different source than the rest of the frames, lest the user
> pull is hair out trying to determine how function X seems to call
> function Y despite that not being the case in the source.

I agree. I personally like your suggested approach - only add an indication 
when the IP differs so much that it points to a different function. What do 
others say to this?

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-14 Thread Milian Wolff
 [-01]57.870061: 701199 cycles:pppu: 
   7fc1042797b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so)
   7fc1042797b5 __hypot_finite+0x155 (/usr/lib/libm-2.28.so)
   7fc10425faf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) (unwind ip 
differs)
   5622c7452128 main+0x88 (/tmp/cpp-inlining)
   7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
   5622c74521ed _start+0x2d (/tmp/cpp-inlining)
```

But always skipping the IP is also sometimes wrong, like in this case:

```
cpp-inlining  2605 [-01]57.862313: 694984 cycles:pppu: 
   7fc1042797b9 __hypot_finite+0x159 (/usr/lib/libm-2.28.so)
   5622c7452128 main+0x88 (/tmp/cpp-inlining)
   7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
       5622c74521ed _start+0x2d (/tmp/cpp-inlining)
```

Here, we are missing the hypotf32x call inbetween __hypot_finite and main.

Do we want to introduce some heuristic on how handle these scenarios? I.e. if 
uregs->ip and iregs->ip point to the same function symbol, then skip the frame 
for iregs->ip, otherwise add it?

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts>From 422d2a95eff344407ec425f0de55b264841d1757 Mon Sep 17 00:00:00 2001
From: Milian Wolff 
Date: Wed, 14 Nov 2018 14:10:47 +0100
Subject: [PATCH 1/2] [WIP] perf: make it possible to collect both, iregs and
 uregs

Previously, only one set of registers was stored in the perf
data for both, user and interrupt registers. Now, two distinct
sets can be sampled.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Cc: Jiri Olsa 
---
 arch/x86/events/amd/ibs.c|  2 +-
 arch/x86/events/core.c   |  2 +-
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  7 +++
 arch/x86/events/intel/knc.c  |  2 +-
 arch/x86/events/intel/p4.c   |  2 +-
 arch/x86/kernel/ptrace.c |  2 +-
 arch/x86/kvm/pmu.c   |  4 ++--
 drivers/oprofile/nmi_timer_int.c |  2 +-
 include/linux/perf_event.h   | 18 +++--
 kernel/events/core.c | 34 
 kernel/trace/bpf_trace.c |  2 +-
 kernel/watchdog_hld.c|  2 +-
 13 files changed, 43 insertions(+), 38 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4dc0650..567db8878511 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -670,7 +670,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		data.raw = 
 	}
 
-	throttle = perf_event_overflow(event, , );
+	throttle = perf_event_overflow(event, , , iregs);
 out:
 	if (throttle)
 		perf_ibs_stop(event, 0);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 106911b603bd..acdcafa57ca0 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1493,7 +1493,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 		if (!x86_perf_event_set_period(event))
 			continue;
 
-		if (perf_event_overflow(event, , regs))
+		if (perf_event_overflow(event, , regs, regs))
 			x86_pmu_stop(event, 0);
 	}
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 273c62e81546..2156620b3d9e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2299,7 +2299,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		if (has_branch_stack(event))
 			data.br_stack = >lbr_stack;
 
-		if (perf_event_overflow(event, , regs))
+		if (perf_event_overflow(event, , regs, regs))
 			x86_pmu_stop(event, 0);
 	}
 
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index b7b01d762d32..018fc0649033 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -639,7 +639,7 @@ int intel_pmu_drain_bts_buffer(void)
 	 * the sample.
 	 */
 	rcu_read_lock();
-	perf_prepare_sample(, , event, );
+	perf_prepare_sample(, , event, , );
 
 	if (perf_output_begin(, event, header.size *
 			  (top - base - skip)))
@@ -1273,7 +1273,6 @@ static void setup_pebs_sample_data(struct perf_event *event,
 		set_linear_ip(regs, pebs->ip);
 	}
 
-
 	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
 	x86_pmu.intel_cap.pebs_format >= 1)
 		data->addr = pebs->dla;
@@ -1430,7 +1429,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 
 	while (count > 1) {
 		setup_pebs_sample_data(event, iregs, at, , );
-		perf_event_output(event, , );
+		perf_event_output(event, , , iregs);
 		at += x86_pmu.pebs_record_size;
 		at = get_next_pebs_record_by_bit(at, top, bit);
 		count--;
@@ -1442,7 +1441,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	 * All but the last records are processed.
 	 * The last one is left to be able to call the overflow handler.
 	 */
-	if (perf_event_overfl

Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-14 Thread Milian Wolff
 [-01]57.870061: 701199 cycles:pppu: 
   7fc1042797b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so)
   7fc1042797b5 __hypot_finite+0x155 (/usr/lib/libm-2.28.so)
   7fc10425faf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so) (unwind ip 
differs)
   5622c7452128 main+0x88 (/tmp/cpp-inlining)
   7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
   5622c74521ed _start+0x2d (/tmp/cpp-inlining)
```

But always skipping the IP is also sometimes wrong, like in this case:

```
cpp-inlining  2605 [-01]57.862313: 694984 cycles:pppu: 
   7fc1042797b9 __hypot_finite+0x159 (/usr/lib/libm-2.28.so)
   5622c7452128 main+0x88 (/tmp/cpp-inlining)
   7fc104096222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
       5622c74521ed _start+0x2d (/tmp/cpp-inlining)
```

Here, we are missing the hypotf32x call inbetween __hypot_finite and main.

Do we want to introduce some heuristic on how handle these scenarios? I.e. if 
uregs->ip and iregs->ip point to the same function symbol, then skip the frame 
for iregs->ip, otherwise add it?

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts>From 422d2a95eff344407ec425f0de55b264841d1757 Mon Sep 17 00:00:00 2001
From: Milian Wolff 
Date: Wed, 14 Nov 2018 14:10:47 +0100
Subject: [PATCH 1/2] [WIP] perf: make it possible to collect both, iregs and
 uregs

Previously, only one set of registers was stored in the perf
data for both, user and interrupt registers. Now, two distinct
sets can be sampled.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Cc: Jiri Olsa 
---
 arch/x86/events/amd/ibs.c|  2 +-
 arch/x86/events/core.c   |  2 +-
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  7 +++
 arch/x86/events/intel/knc.c  |  2 +-
 arch/x86/events/intel/p4.c   |  2 +-
 arch/x86/kernel/ptrace.c |  2 +-
 arch/x86/kvm/pmu.c   |  4 ++--
 drivers/oprofile/nmi_timer_int.c |  2 +-
 include/linux/perf_event.h   | 18 +++--
 kernel/events/core.c | 34 
 kernel/trace/bpf_trace.c |  2 +-
 kernel/watchdog_hld.c|  2 +-
 13 files changed, 43 insertions(+), 38 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4dc0650..567db8878511 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -670,7 +670,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		data.raw = 
 	}
 
-	throttle = perf_event_overflow(event, , );
+	throttle = perf_event_overflow(event, , , iregs);
 out:
 	if (throttle)
 		perf_ibs_stop(event, 0);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 106911b603bd..acdcafa57ca0 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1493,7 +1493,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 		if (!x86_perf_event_set_period(event))
 			continue;
 
-		if (perf_event_overflow(event, , regs))
+		if (perf_event_overflow(event, , regs, regs))
 			x86_pmu_stop(event, 0);
 	}
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 273c62e81546..2156620b3d9e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2299,7 +2299,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		if (has_branch_stack(event))
 			data.br_stack = >lbr_stack;
 
-		if (perf_event_overflow(event, , regs))
+		if (perf_event_overflow(event, , regs, regs))
 			x86_pmu_stop(event, 0);
 	}
 
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index b7b01d762d32..018fc0649033 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -639,7 +639,7 @@ int intel_pmu_drain_bts_buffer(void)
 	 * the sample.
 	 */
 	rcu_read_lock();
-	perf_prepare_sample(, , event, );
+	perf_prepare_sample(, , event, , );
 
 	if (perf_output_begin(, event, header.size *
 			  (top - base - skip)))
@@ -1273,7 +1273,6 @@ static void setup_pebs_sample_data(struct perf_event *event,
 		set_linear_ip(regs, pebs->ip);
 	}
 
-
 	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
 	x86_pmu.intel_cap.pebs_format >= 1)
 		data->addr = pebs->dla;
@@ -1430,7 +1429,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 
 	while (count > 1) {
 		setup_pebs_sample_data(event, iregs, at, , );
-		perf_event_output(event, , );
+		perf_event_output(event, , , iregs);
 		at += x86_pmu.pebs_record_size;
 		at = get_next_pebs_record_by_bit(at, top, bit);
 		count--;
@@ -1442,7 +1441,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	 * All but the last records are processed.
 	 * The last one is left to be able to call the overflow handler.
 	 */
-	if (perf_event_overfl

Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-08 Thread Milian Wolff
On Mittwoch, 7. November 2018 23:41:31 CET Milian Wolff wrote:
> On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote:
> > > Where would I look for the source to change here? So far, I only
> > > concentrated on the userspace side of perf in tools/perf.
> > 
> > Kind of similar to
> > 
> > a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting
> > fdfbbd07e91f perf: Add generic transaction flags
> > 
> > Report the original (not overwritten) regs->ip and regs->sp
> 
> Thanks a lot Andi! With your help, I have managed to find the exact issue
> for my scenario. Turns out, it really is "just" the instruction pointer
> that is wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS
> we correct that to IP = 7feda32ca688. The SP register value stays the same
> according to my printk output. Using the original IP value, we can unwind
> correctly since we point to the correct place in the .eh_frame section. The
> PEBS IP points to a different position in the .eh_frame section, which is
> "too early".
> 
> That brings up some questions:
> 
> - I noticed `perf record --intr-regs`, but the values recorded in the
> perf.data file are always the same. I.e. comparing uregs and iregs, I always
> see the same values printed by `perf script`. This smells like a bug to me,
> but so far I haven't figured out why this happens...

The reason seems to be that perf_event_output only takes one set of registers, 
which then gets handed down into perf_prepare_sample where it gets sampled. 
Thus if sample type has both PERF_SAMPLE_REGS_USER and PERF_SAMPLE_REGS_INTR 
set, then by design both will store the same values for user space samples.

Can we change this, such that perf_event_output also takes a second set of 
registers (iregs) that get sampled for PERF_SAMPLE_REGS_INTR? I'm very new to 
real kernel development, what kind of ABI/API stability guarantees exist for 
something like "perf_event_output"?

> - Independently, when I add a custom printk manually in `arch/x86/events/
> intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing
> any differences between SP in iregs/pebs/regs. Shouldn't it also be
> recorded via PEBS? Or is it just chance that I'm never seeing any
> difference in setup_pebs_sample_data between iregs->sp and regs->sp?

The reason here seems to be that the registers stored in "pebs" are 
essentially the same as iregs for the setup for `perf record --call-graph 
dwarf`. The difference is the availability of `pebs->real_ip` which gets used 
on my system to fixup the IP. SP stays untouched and is thus only truly valid 
for the untouched IP (which is discarded currently - see above).

> - Generally, how do we want to handle this bug? If `--intr-regs` would
> actually record a different IP than stored in uregs in the perf.data file,
> then we could use that as a fallback for unwinding, when it fails the first
> time. Or should we always unwind from that IP? How do we mark the "actual"
> frame/IP then, if that differs?
> 
> Thanks


-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-08 Thread Milian Wolff
On Mittwoch, 7. November 2018 23:41:31 CET Milian Wolff wrote:
> On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote:
> > > Where would I look for the source to change here? So far, I only
> > > concentrated on the userspace side of perf in tools/perf.
> > 
> > Kind of similar to
> > 
> > a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting
> > fdfbbd07e91f perf: Add generic transaction flags
> > 
> > Report the original (not overwritten) regs->ip and regs->sp
> 
> Thanks a lot Andi! With your help, I have managed to find the exact issue
> for my scenario. Turns out, it really is "just" the instruction pointer
> that is wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS
> we correct that to IP = 7feda32ca688. The SP register value stays the same
> according to my printk output. Using the original IP value, we can unwind
> correctly since we point to the correct place in the .eh_frame section. The
> PEBS IP points to a different position in the .eh_frame section, which is
> "too early".
> 
> That brings up some questions:
> 
> - I noticed `perf record --intr-regs`, but the values recorded in the
> perf.data file are always the same. I.e. comparing uregs and iregs, I always
> see the same values printed by `perf script`. This smells like a bug to me,
> but so far I haven't figured out why this happens...

The reason seems to be that perf_event_output only takes one set of registers, 
which then gets handed down into perf_prepare_sample where it gets sampled. 
Thus if sample type has both PERF_SAMPLE_REGS_USER and PERF_SAMPLE_REGS_INTR 
set, then by design both will store the same values for user space samples.

Can we change this, such that perf_event_output also takes a second set of 
registers (iregs) that get sampled for PERF_SAMPLE_REGS_INTR? I'm very new to 
real kernel development, what kind of ABI/API stability guarantees exist for 
something like "perf_event_output"?

> - Independently, when I add a custom printk manually in `arch/x86/events/
> intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing
> any differences between SP in iregs/pebs/regs. Shouldn't it also be
> recorded via PEBS? Or is it just chance that I'm never seeing any
> difference in setup_pebs_sample_data between iregs->sp and regs->sp?

The reason here seems to be that the registers stored in "pebs" are 
essentially the same as iregs for the setup for `perf record --call-graph 
dwarf`. The difference is the availability of `pebs->real_ip` which gets used 
on my system to fixup the IP. SP stays untouched and is thus only truly valid 
for the untouched IP (which is discarded currently - see above).

> - Generally, how do we want to handle this bug? If `--intr-regs` would
> actually record a different IP than stored in uregs in the perf.data file,
> then we could use that as a fallback for unwinding, when it fails the first
> time. Or should we always unwind from that IP? How do we mark the "actual"
> frame/IP then, if that differs?
> 
> Thanks


-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-07 Thread Milian Wolff
On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote:
> > Where would I look for the source to change here? So far, I only
> > concentrated on the userspace side of perf in tools/perf.
> 
> Kind of similar to
> 
> a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting
> fdfbbd07e91f perf: Add generic transaction flags
> 
> Report the original (not overwritten) regs->ip and regs->sp

Thanks a lot Andi! With your help, I have managed to find the exact issue for 
my scenario. Turns out, it really is "just" the instruction pointer that is 
wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS we correct 
that to IP = 7feda32ca688. The SP register value stays the same according to 
my printk output. Using the original IP value, we can unwind correctly since 
we point to the correct place in the .eh_frame section. The PEBS IP points to 
a different position in the .eh_frame section, which is "too early".

That brings up some questions:

- I noticed `perf record --intr-regs`, but the values recorded in the 
perf.data file are always the same. I.e. comparing uregs and iregs, I always 
see the same values printed by `perf script`. This smells like a bug to me, 
but so far I haven't figured out why this happens...

- Independently, when I add a custom printk manually in `arch/x86/events/
intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing any 
differences between SP in iregs/pebs/regs. Shouldn't it also be recorded via 
PEBS? Or is it just chance that I'm never seeing any difference in 
setup_pebs_sample_data between iregs->sp and regs->sp?

- Generally, how do we want to handle this bug? If `--intr-regs` would 
actually record a different IP than stored in uregs in the perf.data file, 
then we could use that as a fallback for unwinding, when it fails the first 
time. Or should we always unwind from that IP? How do we mark the "actual" 
frame/IP then, if that differs?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-07 Thread Milian Wolff
On Dienstag, 6. November 2018 21:24:11 CET Andi Kleen wrote:
> > Where would I look for the source to change here? So far, I only
> > concentrated on the userspace side of perf in tools/perf.
> 
> Kind of similar to
> 
> a405bad5ad20 perf/x86: Add Haswell specific transaction flag reporting
> fdfbbd07e91f perf: Add generic transaction flags
> 
> Report the original (not overwritten) regs->ip and regs->sp

Thanks a lot Andi! With your help, I have managed to find the exact issue for 
my scenario. Turns out, it really is "just" the instruction pointer that is 
wrong. I.e. originally we have IP = 0x7feda32ca68c, but with PEBS we correct 
that to IP = 7feda32ca688. The SP register value stays the same according to 
my printk output. Using the original IP value, we can unwind correctly since 
we point to the correct place in the .eh_frame section. The PEBS IP points to 
a different position in the .eh_frame section, which is "too early".

That brings up some questions:

- I noticed `perf record --intr-regs`, but the values recorded in the 
perf.data file are always the same. I.e. comparing uregs and iregs, I always 
see the same values printed by `perf script`. This smells like a bug to me, 
but so far I haven't figured out why this happens...

- Independently, when I add a custom printk manually in `arch/x86/events/
intel/ds.c` at the end of `setup_pebs_sample_data`, then I'm never seeing any 
differences between SP in iregs/pebs/regs. Shouldn't it also be recorded via 
PEBS? Or is it just chance that I'm never seeing any difference in 
setup_pebs_sample_data between iregs->sp and regs->sp?

- Generally, how do we want to handle this bug? If `--intr-regs` would 
actually record a different IP than stored in uregs in the perf.data file, 
then we could use that as a fallback for unwinding, when it fails the first 
time. Or should we always unwind from that IP? How do we mark the "actual" 
frame/IP then, if that differs?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[PATCH] perf script: share code and output format for uregs and iregs output

2018-11-07 Thread Milian Wolff
The iregs output was missing the newline at end as well as the leading
ABI output. This made it hard to compare the iregs and uregs values.
Instead, use a single function to output the register values and use
it for both, iregs and uregs, to ensure the output is consistent.

Before:

```
perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
   AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286
BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1  ABI:2AX:0xffda
BX:0xCX:0x7f84ad85798bDX:0x560209699d50
SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0
SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b
R8:0x7ffe2c7a2030R9:0x7f84ae55f010   R10:0x8   R11:0x206   
R12:0x   R13:0x   R14:0x   
R15:0x

perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...
```

After:

```
perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
 ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286
BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1
 ABI:2AX:0xffdaBX:0xCX:0x7f84ad85798b   
 DX:0x560209699d50SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9b
BP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206
CS:0x33SS:0x2bR8:0x7ffe2c7a2030R9:0x7f84ae55f010   R10:0x8   
R11:0x206   R12:0x   R13:0x   
R14:0x   R15:0x

perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...
```

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 40 -
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index daf73832743e..04913136bac9 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -566,30 +566,10 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
return 0;
 }
 
-static int perf_sample__fprintf_iregs(struct perf_sample *sample,
- struct perf_event_attr *attr, FILE *fp)
-{
-   struct regs_dump *regs = >intr_regs;
-   uint64_t mask = attr->sample_regs_intr;
-   unsi

[PATCH] perf script: share code and output format for uregs and iregs output

2018-11-07 Thread Milian Wolff
The iregs output was missing the newline at end as well as the leading
ABI output. This made it hard to compare the iregs and uregs values.
Instead, use a single function to output the register values and use
it for both, iregs and uregs, to ensure the output is consistent.

Before:

```
perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
   AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286
BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1  ABI:2AX:0xffda
BX:0xCX:0x7f84ad85798bDX:0x560209699d50
SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9bBP:0x7ffe2c7a20d0
SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206CS:0x33SS:0x2b
R8:0x7ffe2c7a2030R9:0x7f84ae55f010   R10:0x8   R11:0x206   
R12:0x   R13:0x   R14:0x   
R15:0x

perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...
```

After:

```
perf  7049 [-01]  1343.354347:  1 cycles:ppp:
a7bc21ce perf_event_exec+0x18e 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7ead3 setup_new_exec+0xf3 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7cd7be5 load_elf_binary+0x395 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7e540 search_binary_handler+0x80 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f1aa __do_execve_file.isra.13+0x58a 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f561 do_execve+0x21 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7c7f596 __x64_sys_execve+0x26 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a7a041cb do_syscall_64+0x5b 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
a840008c entry_SYSCALL_64+0x7c 
(/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
 ABI:2AX:0x8000BX:0x0CX:0x0DX:0x7SI:0xfDI:0x286
BP:0x95bc8213a460SP:0xacbf0ba97d18IP:0xa7bc21cd 
FLAGS:0x28eCS:0x10SS:0x18R8:0x2R9:0x21440   R10:0x33816fb3b8c   
R11:0x1   R12:0x95bc8213a460   R13:0x95bc8213a400   
R14:0x95bc8213a400   R15:0x1
 ABI:2AX:0xffdaBX:0xCX:0x7f84ad85798b   
 DX:0x560209699d50SI:0x7ffe2c7a6820DI:0x7ffe2c7a8c9b
BP:0x7ffe2c7a20d0SP:0x7ffe2c7a2058IP:0x7f84ad85798b FLAGS:0x206
CS:0x33SS:0x2bR8:0x7ffe2c7a2030R9:0x7f84ae55f010   R10:0x8   
R11:0x206   R12:0x   R13:0x   
R14:0x   R15:0x

perf  7049 [-01]  1343.354363:  1 cycles:ppp:
...
```

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 40 -
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index daf73832743e..04913136bac9 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -566,30 +566,10 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
return 0;
 }
 
-static int perf_sample__fprintf_iregs(struct perf_sample *sample,
- struct perf_event_attr *attr, FILE *fp)
-{
-   struct regs_dump *regs = >intr_regs;
-   uint64_t mask = attr->sample_regs_intr;
-   unsi

[PATCH] perf script: add newline after uregs output

2018-11-07 Thread Milian Wolff
This change makes it much easier to easily distinguish
between consecutive samples by keeping the empty line
between them, like we see when we do not enable uregs
output.

Before:

```
cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...
cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...
```

After:

```
cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...

cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...
```

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index b5bc85bd0bbe..daf73832743e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample 
*sample,
printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), 
val);
}
 
+   fprintf(fp, "\n");
+
return printed;
 }
 
-- 
2.19.1


[PATCH] perf script: add newline after uregs output

2018-11-07 Thread Milian Wolff
This change makes it much easier to easily distinguish
between consecutive samples by keeping the empty line
between them, like we see when we do not enable uregs
output.

Before:

```
cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...
cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...
```

After:

```
cpp-inlining 28298 [-01] 54837.342780:3068085 cycles:pp:
77c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x0BX:0x40f56cf6CX:0x294a3ae7...

cpp-inlining 28298 [-01] 54837.344493:2881929 cycles:pp:
77c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
...
 ABI:2AX:0x40d440c7BX:0x40d440c7CX:0x4d45e5da...
```

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index b5bc85bd0bbe..daf73832743e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -603,6 +603,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample 
*sample,
printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), 
val);
}
 
+   fprintf(fp, "\n");
+
return printed;
 }
 
-- 
2.19.1


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-06 Thread Milian Wolff
On Dienstag, 6. November 2018 09:39:57 CET Jiri Olsa wrote:
> On Mon, Nov 05, 2018 at 04:10:37PM -0800, Andi Kleen wrote:
> > > > > - PMU triggers interrupt and PEBS stores RIP etc.
> > > > > - code continous to execute, possibly changing the stack
> > > > 
> > > > I dont think the code continues to execute.. the stack is ok
> > > 
> > > Are you sure about this? I mean, isn't that the whole reason why we need
> > > PEBS? Generally, if you are sure about this, can you point me to some
> > > documentation on this to allow me to understand it better?
> > 
> > Milian is right.
> > 
> > There is a execution window from PEBS capturing registers to actually
> > triggering the PMU, and if there is stack manipulation in that window
> > the PEBS state might be out of sync with the real stack.
> 
> hum, is this about having 'large pebs' or there's this window
> if there's also only single pebs record allowed? which should
> be case for dwarf unwind
> 
> > The right RIP/RSP to use for the stack unwinding is always the data
> > in the PMI's exception frame on the stack.
> > 
> > Probably would need to modify perf to report those too in addition
> > to the PEBS registers.
> 
> ok, should not be that hard

Where would I look for the source to change here? So far, I only concentrated 
on the userspace side of perf in tools/perf.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-06 Thread Milian Wolff
On Dienstag, 6. November 2018 09:39:57 CET Jiri Olsa wrote:
> On Mon, Nov 05, 2018 at 04:10:37PM -0800, Andi Kleen wrote:
> > > > > - PMU triggers interrupt and PEBS stores RIP etc.
> > > > > - code continous to execute, possibly changing the stack
> > > > 
> > > > I dont think the code continues to execute.. the stack is ok
> > > 
> > > Are you sure about this? I mean, isn't that the whole reason why we need
> > > PEBS? Generally, if you are sure about this, can you point me to some
> > > documentation on this to allow me to understand it better?
> > 
> > Milian is right.
> > 
> > There is a execution window from PEBS capturing registers to actually
> > triggering the PMU, and if there is stack manipulation in that window
> > the PEBS state might be out of sync with the real stack.
> 
> hum, is this about having 'large pebs' or there's this window
> if there's also only single pebs record allowed? which should
> be case for dwarf unwind
> 
> > The right RIP/RSP to use for the stack unwinding is always the data
> > in the PMI's exception frame on the stack.
> > 
> > Probably would need to modify perf to report those too in addition
> > to the PEBS registers.
> 
> ok, should not be that hard

Where would I look for the source to change here? So far, I only concentrated 
on the userspace side of perf in tools/perf.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-05 Thread Milian Wolff
On Montag, 5. November 2018 21:51:19 CET Jiri Olsa wrote:
> On Fri, Nov 02, 2018 at 06:56:50PM +0100, Milian Wolff wrote:
> 
> SNIP
> 
> > > > Note how precise levels 0 and 1 do not produce any samples where
> > > > unwinding
> > > > fails. But precise level 2 produces some, and precise level 3
> > > > increases
> > > > the
> > > > amount (by ca. ~2x).
> > > > 
> > > > I can reproduce this pattern on two separate Intel CPUs and kernel
> > > > versions
> > > > currently:
> > > > 
> > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
> > > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts
> > > > 
> > > > Could someone else try this? What about AMD and IBS - is it also
> > > > affected?
> > > > What about newer/different Intel CPUs?
> > > 
> > > I tried on intel and can't actualy see that.. how do the failed samples
> > > look like? like is there the stack dump attached, what's in the regs?
> > > 
> > > could you please paste the 'perf report -D' output for some of the
> > > failed samples?
> > 
> > See here for one case: https://paste.kde.org/prryvdilq
> 
> we should really print some helpfull debug output
> for this.. like to show some markers where the stack
> data starts

Further down below, the offset for the ustack start is given (0xe0). But yes, 
that would be welcome.

> > What Intel CPU did you use? What microcode version? Which kernel version?
> > 
> > Generally, isn't what I'm seeing actually a neccessary evil of the ustack
> > based unwinding in perf? I mean, the general procedure is as follows if
> > I'm
> > not mistaken:
> > 
> > - PMU triggers interrupt and PEBS stores RIP etc.
> > - code continous to execute, possibly changing the stack
> 
> I dont think the code continues to execute.. the stack is ok

Are you sure about this? I mean, isn't that the whole reason why we need PEBS? 
Generally, if you are sure about this, can you point me to some documentation 
on this to allow me to understand it better?

Also, how do you explain the scenario I am seeing, with `cycles:` and 
`cycles:p` not suffering from this issue, but `cycles:pp` and `cycles:ppp` 
leading to broken samples? It _has_ to be PEBS then, no? What else could 
explain this?

> the problem I saw in past is that the copy from user is not
> 100% and sometimes you might not get full stack data you
> asked for

But that would indicate missing data at the end of the ustack dump? In our 
case, the "problematic" data is always at the start.

Also note the apparent shift in the ustack copy which - in one case - directly 
correlatates with the code being executed, i.e. from objdump in libm I see:

0x00029688 <+40>:sub$0x28,%rsp
(https://paste.kde.org/poywa7y2z)

The address of the expected parent frame is 77c7caf8 (hypotf32x+0x18). 
This can be found at offset 80 in the ustack dump (cf. https://paste.kde.org/
prryvdilq - ("f9 ca c7 f7 ff 7f" is found at 0x130, minus 0xe0 yields 0x50 or 
80).

>From the libunwind (or libdw) debug output in perf, we see that the unwinder 
tries to access offset 32 (cf. https://paste.kde.org/prryvdilq#line-610), 
which is ofset by 48 from the desired value of 80. This offset is *veroy* 
close to the value of 40 we see in the libm disassembly for __hypot_function 
("$0x28,%rsp"). Is this really just a coincidence?

> have you tried with libdw unwinder? if one of the unwinder
> shows more callchains, we need to fix the other one ;-)

Yes, I've looked at both unwinders. Both try to access the same values, and 
both break due to seemingly wrong data being read from the stack. And if you 
look at my other patches, you may have seen that I've regularly fixed the 
libdw unwinder to bring it closer to libunwind.

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-05 Thread Milian Wolff
On Montag, 5. November 2018 21:51:19 CET Jiri Olsa wrote:
> On Fri, Nov 02, 2018 at 06:56:50PM +0100, Milian Wolff wrote:
> 
> SNIP
> 
> > > > Note how precise levels 0 and 1 do not produce any samples where
> > > > unwinding
> > > > fails. But precise level 2 produces some, and precise level 3
> > > > increases
> > > > the
> > > > amount (by ca. ~2x).
> > > > 
> > > > I can reproduce this pattern on two separate Intel CPUs and kernel
> > > > versions
> > > > currently:
> > > > 
> > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
> > > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts
> > > > 
> > > > Could someone else try this? What about AMD and IBS - is it also
> > > > affected?
> > > > What about newer/different Intel CPUs?
> > > 
> > > I tried on intel and can't actualy see that.. how do the failed samples
> > > look like? like is there the stack dump attached, what's in the regs?
> > > 
> > > could you please paste the 'perf report -D' output for some of the
> > > failed samples?
> > 
> > See here for one case: https://paste.kde.org/prryvdilq
> 
> we should really print some helpfull debug output
> for this.. like to show some markers where the stack
> data starts

Further down below, the offset for the ustack start is given (0xe0). But yes, 
that would be welcome.

> > What Intel CPU did you use? What microcode version? Which kernel version?
> > 
> > Generally, isn't what I'm seeing actually a neccessary evil of the ustack
> > based unwinding in perf? I mean, the general procedure is as follows if
> > I'm
> > not mistaken:
> > 
> > - PMU triggers interrupt and PEBS stores RIP etc.
> > - code continous to execute, possibly changing the stack
> 
> I dont think the code continues to execute.. the stack is ok

Are you sure about this? I mean, isn't that the whole reason why we need PEBS? 
Generally, if you are sure about this, can you point me to some documentation 
on this to allow me to understand it better?

Also, how do you explain the scenario I am seeing, with `cycles:` and 
`cycles:p` not suffering from this issue, but `cycles:pp` and `cycles:ppp` 
leading to broken samples? It _has_ to be PEBS then, no? What else could 
explain this?

> the problem I saw in past is that the copy from user is not
> 100% and sometimes you might not get full stack data you
> asked for

But that would indicate missing data at the end of the ustack dump? In our 
case, the "problematic" data is always at the start.

Also note the apparent shift in the ustack copy which - in one case - directly 
correlatates with the code being executed, i.e. from objdump in libm I see:

0x00029688 <+40>:sub$0x28,%rsp
(https://paste.kde.org/poywa7y2z)

The address of the expected parent frame is 77c7caf8 (hypotf32x+0x18). 
This can be found at offset 80 in the ustack dump (cf. https://paste.kde.org/
prryvdilq - ("f9 ca c7 f7 ff 7f" is found at 0x130, minus 0xe0 yields 0x50 or 
80).

>From the libunwind (or libdw) debug output in perf, we see that the unwinder 
tries to access offset 32 (cf. https://paste.kde.org/prryvdilq#line-610), 
which is ofset by 48 from the desired value of 80. This offset is *veroy* 
close to the value of 40 we see in the libm disassembly for __hypot_function 
("$0x28,%rsp"). Is this really just a coincidence?

> have you tried with libdw unwinder? if one of the unwinder
> shows more callchains, we need to fix the other one ;-)

Yes, I've looked at both unwinders. Both try to access the same values, and 
both break due to seemingly wrong data being read from the stack. And if you 
look at my other patches, you may have seen that I've regularly fixed the 
libdw unwinder to bring it closer to libunwind.

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-02 Thread Milian Wolff
On Freitag, 2. November 2018 12:26:35 CET Jiri Olsa wrote:
> On Thu, Nov 01, 2018 at 11:08:18PM +0100, Milian Wolff wrote:
> > On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote:
> > > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote:
> > > > > Can someone at least confirm whether unwinding from a function
> > > > > prologue
> > > > > via
> > > > > .eh_frame (but without .debug_frame) should actually be possible?
> > > > 
> > > > Yes it should be possible. Asynchronous unwind tables should work
> > > > from any instruction.
> > 
> > 
> > 
> > > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9
> > > da
> > > 5b 34 91 7f"). Using that address makes unwinding work for this sample.
> > > What could be the reason for this shift?
> > 
> > I believe I have found the culprit: PEBS seems to be at fault here - i.e.
> > the RIP/RSP and the ustack dump of the sample simply don't fit together.
> > 
> > Check this out:
> > 
> > ```
> > $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf
> > ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 
> > $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf
> > ./
> > cpp-inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 
> > $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf
> > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 37
> > 39
> > 35
> > 28
> > 40
> > 39
> > 29
> > 37
> > 31
> > 26
> > 
> > $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf
> > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 79
> > 70
> > 76
> > 77
> > 70
> > 90
> > 64
> > 78
> > 86
> > 74
> > ```
> > 
> > Note how precise levels 0 and 1 do not produce any samples where unwinding
> > fails. But precise level 2 produces some, and precise level 3 increases
> > the
> > amount (by ca. ~2x).
> > 
> > I can reproduce this pattern on two separate Intel CPUs and kernel
> > versions
> > currently:
> > 
> > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
> > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts
> > 
> > Could someone else try this? What about AMD and IBS - is it also affected?
> > What about newer/different Intel CPUs?
> 
> I tried on intel and can't actualy see that.. how do the failed samples
> look like? like is there the stack dump attached, what's in the regs?
> 
> could you please paste the 'perf report -D' output for some of the
> failed samples?

See here for one case: https://paste.kde.org/prryvdilq

What Intel CPU did you use? What microcode version? Which kernel version?

Generally, isn't what I'm seeing actually a neccessary evil of the ustack 
based unwinding in perf? I mean, the general procedure is as follows if I'm 
not mistaken:

- PMU triggers interrupt and PEBS stores RIP etc.
- code continous to execute, possibly changing the stack
- PMU interrupt is handled, and a perf sample is generated
  - register values are used from "past" status stored in PEBS
  - but ustack dump is based on the "current" status

>From this latter discrepancy, it must directly follow that *sometimes* the 
ustack dump represents a state that cannot be unwound from, no?

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-02 Thread Milian Wolff
On Freitag, 2. November 2018 12:26:35 CET Jiri Olsa wrote:
> On Thu, Nov 01, 2018 at 11:08:18PM +0100, Milian Wolff wrote:
> > On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote:
> > > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote:
> > > > > Can someone at least confirm whether unwinding from a function
> > > > > prologue
> > > > > via
> > > > > .eh_frame (but without .debug_frame) should actually be possible?
> > > > 
> > > > Yes it should be possible. Asynchronous unwind tables should work
> > > > from any instruction.
> > 
> > 
> > 
> > > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9
> > > da
> > > 5b 34 91 7f"). Using that address makes unwinding work for this sample.
> > > What could be the reason for this shift?
> > 
> > I believe I have found the culprit: PEBS seems to be at fault here - i.e.
> > the RIP/RSP and the ustack dump of the sample simply don't fit together.
> > 
> > Check this out:
> > 
> > ```
> > $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf
> > ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 
> > $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf
> > ./
> > cpp-inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 
> > $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf
> > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 37
> > 39
> > 35
> > 28
> > 40
> > 39
> > 29
> > 37
> > 31
> > 26
> > 
> > $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf
> > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M
> > "hypot_finite.*\n.*\ [unknown\]"; done
> > 79
> > 70
> > 76
> > 77
> > 70
> > 90
> > 64
> > 78
> > 86
> > 74
> > ```
> > 
> > Note how precise levels 0 and 1 do not produce any samples where unwinding
> > fails. But precise level 2 produces some, and precise level 3 increases
> > the
> > amount (by ca. ~2x).
> > 
> > I can reproduce this pattern on two separate Intel CPUs and kernel
> > versions
> > currently:
> > 
> > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
> > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts
> > 
> > Could someone else try this? What about AMD and IBS - is it also affected?
> > What about newer/different Intel CPUs?
> 
> I tried on intel and can't actualy see that.. how do the failed samples
> look like? like is there the stack dump attached, what's in the regs?
> 
> could you please paste the 'perf report -D' output for some of the
> failed samples?

See here for one case: https://paste.kde.org/prryvdilq

What Intel CPU did you use? What microcode version? Which kernel version?

Generally, isn't what I'm seeing actually a neccessary evil of the ustack 
based unwinding in perf? I mean, the general procedure is as follows if I'm 
not mistaken:

- PMU triggers interrupt and PEBS stores RIP etc.
- code continous to execute, possibly changing the stack
- PMU interrupt is handled, and a perf sample is generated
  - register values are used from "past" status stored in PEBS
  - but ustack dump is based on the "current" status

>From this latter discrepancy, it must directly follow that *sometimes* the 
ustack dump represents a state that cannot be unwound from, no?

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-01 Thread Milian Wolff
On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote:
> On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote:
> > > Can someone at least confirm whether unwinding from a function prologue
> > > via
> > > .eh_frame (but without .debug_frame) should actually be possible?
> > 
> > Yes it should be possible. Asynchronous unwind tables should work
> > from any instruction.



> We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da
> 5b 34 91 7f"). Using that address makes unwinding work for this sample.
> What could be the reason for this shift?

I believe I have found the culprit: PEBS seems to be at fault here - i.e. the 
RIP/RSP and the ustack dump of the sample simply don't fit together.

Check this out:

```
$ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp-
inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
37
39
35
28
40
39
29
37
31
26

$ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
79
70
76
77
70
90
64
78
86
74
```

Note how precise levels 0 and 1 do not produce any samples where unwinding 
fails. But precise level 2 produces some, and precise level 3 increases the 
amount (by ca. ~2x).

I can reproduce this pattern on two separate Intel CPUs and kernel versions 
currently:

Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts

Could someone else try this? What about AMD and IBS - is it also affected? 
What about newer/different Intel CPUs?

Better yet, can someone come up with a fix for this on Intel with maximum 
precise level?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-01 Thread Milian Wolff
On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote:
> On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote:
> > > Can someone at least confirm whether unwinding from a function prologue
> > > via
> > > .eh_frame (but without .debug_frame) should actually be possible?
> > 
> > Yes it should be possible. Asynchronous unwind tables should work
> > from any instruction.



> We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da
> 5b 34 91 7f"). Using that address makes unwinding work for this sample.
> What could be the reason for this shift?

I believe I have found the culprit: PEBS seems to be at fault here - i.e. the 
RIP/RSP and the ustack dump of the sample simply don't fit together.

Check this out:

```
$ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp-
inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
37
39
35
28
40
39
29
37
31
26

$ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
79
70
76
77
70
90
64
78
86
74
```

Note how precise levels 0 and 1 do not produce any samples where unwinding 
fails. But precise level 2 produces some, and precise level 3 increases the 
amount (by ca. ~2x).

I can reproduce this pattern on two separate Intel CPUs and kernel versions 
currently:

Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts

Could someone else try this? What about AMD and IBS - is it also affected? 
What about newer/different Intel CPUs?

Better yet, can someone come up with a fix for this on Intel with maximum 
precise level?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[tip:perf/urgent] perf unwind: Take pgoff into account when reporting elf to libdwfl

2018-10-31 Thread tip-bot for Milian Wolff
Commit-ID:  1fe627da30331024f453faef04d500079b901107
Gitweb: https://git.kernel.org/tip/1fe627da30331024f453faef04d500079b901107
Author: Milian Wolff 
AuthorDate: Mon, 29 Oct 2018 15:16:44 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 31 Oct 2018 09:57:50 -0300

perf unwind: Take pgoff into account when reporting elf to libdwfl

libdwfl parses an ELF file itself and creates mappings for the
individual sections. perf on the other hand sees raw mmap events which
represent individual sections. When we encounter an address pointing
into a mapping with pgoff != 0, we must take that into account and
report the file at the non-offset base address.

This fixes unwinding with libdwfl in some cases. E.g. for a file like:

```

using namespace std;

mutex g_mutex;

double worker()
{
lock_guard guard(g_mutex);
uniform_real_distribution uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 1000; ++i) {
s += norm(complex(uniform(engine), uniform(engine)));
}
cout << s << endl;
return s;
}

int main()
{
vector> results;
for (int i = 0; i < 1; ++i) {
results.push_back(async(launch::async, worker));
}
return 0;
}
```

Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp  -o cpp-locking`,
then record it with `perf record --call-graph dwarf -e
sched:sched_switch`.

When you analyze it with `perf script` and libunwind, you should see:

```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking 
prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 
next_prio=120
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1670208 schedule+0x28 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b16737cc rwsem_down_read_failed+0xec 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1665e04 call_rwsem_down_read_failed+0x14 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1672a03 down_read+0x13 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b106bd85 __do_page_fault+0x445 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b18015f5 page_fault+0x45 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
7f38e463fa6d std::basic_streambuf 
>::sputn(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
>::_M_put(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::__write(std::ostreambuf_iterator >, 
char const*, int)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::num_put > 
>::_M_insert_float(std::ostreambuf_iterator
7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl>
7f38e464bd70 std::ostream& 
std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
563b9cb502f7 worker()+0xb7 
(/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
563b9cb506fb double std::__invoke_impl(std::__invoke_other, double (*&&)())+0x2b (inlined)
563b9cb506fb std::__invoke_result::type 
std::__invoke(double (*&&)())+0x2b (inlined)
563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) 
std::thread::_Invoker 
>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined)
563b9cb506fb std::thread::_Invoker 
>::operator()()+0x2b (inlined)
563b9cb506fb 
std::__future_base::_Task_setter,
 std::__future_base::_Result_base::_Deleter>, 
std::thread::_Invoker >, dou>
563b9cb506fb 
std::_Function_handler (), 
std::__future_base::_Task_setter
563b9cb507e8 
std::function ()>::operator()() const+0x28 
(inlined)
563b9cb507e8 
std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*)+0x28 (/ssd/milian/>
7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so)
563b9cb51149 __gthread_once+0xe9 (inlined)
563b9cb51149 void std::call_once ()>*, bool*)>
 

[tip:perf/urgent] perf unwind: Take pgoff into account when reporting elf to libdwfl

2018-10-31 Thread tip-bot for Milian Wolff
Commit-ID:  1fe627da30331024f453faef04d500079b901107
Gitweb: https://git.kernel.org/tip/1fe627da30331024f453faef04d500079b901107
Author: Milian Wolff 
AuthorDate: Mon, 29 Oct 2018 15:16:44 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 31 Oct 2018 09:57:50 -0300

perf unwind: Take pgoff into account when reporting elf to libdwfl

libdwfl parses an ELF file itself and creates mappings for the
individual sections. perf on the other hand sees raw mmap events which
represent individual sections. When we encounter an address pointing
into a mapping with pgoff != 0, we must take that into account and
report the file at the non-offset base address.

This fixes unwinding with libdwfl in some cases. E.g. for a file like:

```

using namespace std;

mutex g_mutex;

double worker()
{
lock_guard guard(g_mutex);
uniform_real_distribution uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 1000; ++i) {
s += norm(complex(uniform(engine), uniform(engine)));
}
cout << s << endl;
return s;
}

int main()
{
vector> results;
for (int i = 0; i < 1; ++i) {
results.push_back(async(launch::async, worker));
}
return 0;
}
```

Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp  -o cpp-locking`,
then record it with `perf record --call-graph dwarf -e
sched:sched_switch`.

When you analyze it with `perf script` and libunwind, you should see:

```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking 
prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 
next_prio=120
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1670208 schedule+0x28 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b16737cc rwsem_down_read_failed+0xec 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1665e04 call_rwsem_down_read_failed+0x14 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1672a03 down_read+0x13 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b106bd85 __do_page_fault+0x445 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b18015f5 page_fault+0x45 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
7f38e463fa6d std::basic_streambuf 
>::sputn(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
>::_M_put(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::__write(std::ostreambuf_iterator >, 
char const*, int)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::num_put > 
>::_M_insert_float(std::ostreambuf_iterator
7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl>
7f38e464bd70 std::ostream& 
std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
563b9cb502f7 worker()+0xb7 
(/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
563b9cb506fb double std::__invoke_impl(std::__invoke_other, double (*&&)())+0x2b (inlined)
563b9cb506fb std::__invoke_result::type 
std::__invoke(double (*&&)())+0x2b (inlined)
563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) 
std::thread::_Invoker 
>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined)
563b9cb506fb std::thread::_Invoker 
>::operator()()+0x2b (inlined)
563b9cb506fb 
std::__future_base::_Task_setter,
 std::__future_base::_Result_base::_Deleter>, 
std::thread::_Invoker >, dou>
563b9cb506fb 
std::_Function_handler (), 
std::__future_base::_Task_setter
563b9cb507e8 
std::function ()>::operator()() const+0x28 
(inlined)
563b9cb507e8 
std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*)+0x28 (/ssd/milian/>
7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so)
563b9cb51149 __gthread_once+0xe9 (inlined)
563b9cb51149 void std::call_once ()>*, bool*)>
 

Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-30 Thread Milian Wolff
_step: dwarf_step returned 1
  >_Ux86_64_step: returning 1
 >_Ux86_64_step: (cursor=0x7fffafa55c10, ip=0xc0d885722245b5e4, 
cfa=0x7ffd1e276f38)
   >_Ux86_64_step: dwarf_step returned -22
  >_Ux86_64_step: returning -22
unwind: __hypot_finite:ip = 0x7f91345d77b4 (0x297b4)
unwind: '':ip = 0xc0d885722245b5e3 (0x0)

7f91345d77b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so)
c0d885722245b5e3 [unknown] ([unknown])
```


Now, I also tried the following:

```
$ perf probe -x /usr/lib/libm-2.28.so -a __hypot_finite+0x154
$ perf record -F 1000 --call-graph dwarf -e probe_libm:__hypot_finite ./cpp-
inlining
```

And all of the samples unwind correctly! This makes me believe that it's not 
the .eh_frame information which is wrong - otherwise unwinding would always 
fail from these locations, esp. when using the custom probe trace point. But 
since this is not happening, what else could it be? I only see two 
possibilities: the register values or the stack memory stored in in the sample 
by perf.

The register values is unlikely, since I now understand how the .eh_frame 
contents get analyzed. For __hypot_finite+0x154, we will always end up asking 
for the address at SP+24. access_mem thus will always look at the address at 
offset 24, independent of the actual value of SP.

So, what remains is that the stack dump is somehow wrong, i.e. its contents 
are moved by some offset. Note how I can "fix" the unwinding for such broken 
samples by manually applying some offset in access_mem. By looking at other 
samples where unwinding works from __hypot_finite, I could figure out that the 
correct address to be read for unwnding should be 7f91345bdaf8, e.g.:

```
7f91345d76ed __hypot_finite+0x8d (/usr/lib/libm-2.28.so)
7f91345bdaf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
5620579cb128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
build/tests/test-clients/cpp-inlining/cpp-inlining)
```

This address indeed occurs in the user stack dump (starting at 0xe0 in the raw 
event data) for the broken sample, cf.:


```
.  00e0:  00 20 00 00 00 00 00 00 c0 b1 9c 57 20 56 00 00  . .W V..
.  00f0:  70 70 27 1e fd 7f 00 00 f9 da 5b 34 91 7f 00 00  pp'...[4
.  0100:  e4 b5 45 22 72 85 d8 c0 c0 1d 16 84 43 30 bb c0  ..E"r...C0..
.
```

We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da 5b 
34 91 7f"). Using that address makes unwinding work for this sample. What 
could be the reason for this shift?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-30 Thread Milian Wolff
_step: dwarf_step returned 1
  >_Ux86_64_step: returning 1
 >_Ux86_64_step: (cursor=0x7fffafa55c10, ip=0xc0d885722245b5e4, 
cfa=0x7ffd1e276f38)
   >_Ux86_64_step: dwarf_step returned -22
  >_Ux86_64_step: returning -22
unwind: __hypot_finite:ip = 0x7f91345d77b4 (0x297b4)
unwind: '':ip = 0xc0d885722245b5e3 (0x0)

7f91345d77b4 __hypot_finite+0x154 (/usr/lib/libm-2.28.so)
c0d885722245b5e3 [unknown] ([unknown])
```


Now, I also tried the following:

```
$ perf probe -x /usr/lib/libm-2.28.so -a __hypot_finite+0x154
$ perf record -F 1000 --call-graph dwarf -e probe_libm:__hypot_finite ./cpp-
inlining
```

And all of the samples unwind correctly! This makes me believe that it's not 
the .eh_frame information which is wrong - otherwise unwinding would always 
fail from these locations, esp. when using the custom probe trace point. But 
since this is not happening, what else could it be? I only see two 
possibilities: the register values or the stack memory stored in in the sample 
by perf.

The register values is unlikely, since I now understand how the .eh_frame 
contents get analyzed. For __hypot_finite+0x154, we will always end up asking 
for the address at SP+24. access_mem thus will always look at the address at 
offset 24, independent of the actual value of SP.

So, what remains is that the stack dump is somehow wrong, i.e. its contents 
are moved by some offset. Note how I can "fix" the unwinding for such broken 
samples by manually applying some offset in access_mem. By looking at other 
samples where unwinding works from __hypot_finite, I could figure out that the 
correct address to be read for unwnding should be 7f91345bdaf8, e.g.:

```
7f91345d76ed __hypot_finite+0x8d (/usr/lib/libm-2.28.so)
7f91345bdaf8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
5620579cb128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
build/tests/test-clients/cpp-inlining/cpp-inlining)
```

This address indeed occurs in the user stack dump (starting at 0xe0 in the raw 
event data) for the broken sample, cf.:


```
.  00e0:  00 20 00 00 00 00 00 00 c0 b1 9c 57 20 56 00 00  . .W V..
.  00f0:  70 70 27 1e fd 7f 00 00 f9 da 5b 34 91 7f 00 00  pp'...[4
.  0100:  e4 b5 45 22 72 85 d8 c0 c0 1d 16 84 43 30 bb c0  ..E"r...C0..
.
```

We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da 5b 
34 91 7f"). Using that address makes unwinding work for this sample. What 
could be the reason for this shift?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl

2018-10-30 Thread Milian Wolff
On Montag, 29. Oktober 2018 18:40:14 CET Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 29, 2018 at 04:26:27PM +0100, Milian Wolff escreveu:
> > On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote:
> > > Libdwfl parses an ELF file itself and creates mappings for the
> > > individual sections. Perf on the other hand sees raw mmap events which
> > > represent individual sections. When we encounter an address pointing
> > > into a mapping with pgoff != 0, we must take that into account and
> > > report the file at the non-offset base address.
> > 
> > > This fixes unwinding with libdwfl in some cases. E.g. for a file like:
> > 
> > 
> > > Note that the backtrace is still stopping too early, when
> > > compared to the nice results obtained via libunwind. It's
> > > unclear so far what the reason for that is.
> > 
> > The remaining issue is due to a bug in elfutils:
> > 
> > https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html
> > 
> > With both patches applied, libunwind and elfutils produce the same output
> > for the above scenario.
> 
> I'm updating the patch to remove:
> 
> "It's unclear so far what the reason for that is."
> 
> Adding:
> 
> "See https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html for
> a patch fixing that."
> 
> Ok?

Yes, thanks. I figured the fix for elfutils out after I submitted the perf 
patch.

> Or are you saying that that "unclear" part applies to both libunwind
> and elfutils?

No, libunwind worked fine without these patches for this specific case.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl

2018-10-30 Thread Milian Wolff
On Montag, 29. Oktober 2018 18:40:14 CET Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 29, 2018 at 04:26:27PM +0100, Milian Wolff escreveu:
> > On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote:
> > > Libdwfl parses an ELF file itself and creates mappings for the
> > > individual sections. Perf on the other hand sees raw mmap events which
> > > represent individual sections. When we encounter an address pointing
> > > into a mapping with pgoff != 0, we must take that into account and
> > > report the file at the non-offset base address.
> > 
> > > This fixes unwinding with libdwfl in some cases. E.g. for a file like:
> > 
> > 
> > > Note that the backtrace is still stopping too early, when
> > > compared to the nice results obtained via libunwind. It's
> > > unclear so far what the reason for that is.
> > 
> > The remaining issue is due to a bug in elfutils:
> > 
> > https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html
> > 
> > With both patches applied, libunwind and elfutils produce the same output
> > for the above scenario.
> 
> I'm updating the patch to remove:
> 
> "It's unclear so far what the reason for that is."
> 
> Adding:
> 
> "See https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html for
> a patch fixing that."
> 
> Ok?

Yes, thanks. I figured the fix for elfutils out after I submitted the perf 
patch.

> Or are you saying that that "unclear" part applies to both libunwind
> and elfutils?

No, libunwind worked fine without these patches for this specific case.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl

2018-10-29 Thread Milian Wolff
On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote:
> Libdwfl parses an ELF file itself and creates mappings for the
> individual sections. Perf on the other hand sees raw mmap events which
> represent individual sections. When we encounter an address pointing
> into a mapping with pgoff != 0, we must take that into account and
> report the file at the non-offset base address.
> 
> This fixes unwinding with libdwfl in some cases. E.g. for a file like:



> Note that the backtrace is still stopping too early, when
> compared to the nice results obtained via libunwind. It's
> unclear so far what the reason for that is.

The remaining issue is due to a bug in elfutils:

https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html

With both patches applied, libunwind and elfutils produce the same output for 
the above scenario.

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf util: take pgoff into account when reporting elf to libdwfl

2018-10-29 Thread Milian Wolff
On Monday, October 29, 2018 3:16:44 PM CET Milian Wolff wrote:
> Libdwfl parses an ELF file itself and creates mappings for the
> individual sections. Perf on the other hand sees raw mmap events which
> represent individual sections. When we encounter an address pointing
> into a mapping with pgoff != 0, we must take that into account and
> report the file at the non-offset base address.
> 
> This fixes unwinding with libdwfl in some cases. E.g. for a file like:



> Note that the backtrace is still stopping too early, when
> compared to the nice results obtained via libunwind. It's
> unclear so far what the reason for that is.

The remaining issue is due to a bug in elfutils:

https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html

With both patches applied, libunwind and elfutils produce the same output for 
the above scenario.

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[PATCH] perf util: take pgoff into account when reporting elf to libdwfl

2018-10-29 Thread Milian Wolff
ke_impl >, double>::_Async_state_impl(std::thread::_Invoker
563b9cb51149 
std::__invoke_result >, double>::_Async_state_impl(std::thread::_Invoker >>
563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) 
std::thread::_Invoker >, double>::_Async_state_>
563b9cb51149 
std::thread::_Invoker >, double>::_Async_state_impl(std::thread::_Invoker
563b9cb51149 
std::thread::_State_impl >, double>::_Async_state_impl(std::thread>
7f38e45f0062 execute_native_thread_routine+0x12 
(/usr/lib/libstdc++.so.6.0.25)
7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so)
7f38e42ccb22 __GI___clone+0x42 (inlined)
```

Before this patch, using libdwfl, you would see:

```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking 
prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 
next_prio=120
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1670208 schedule+0x28 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b16737cc rwsem_down_read_failed+0xec 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1665e04 call_rwsem_down_read_failed+0x14 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1672a03 down_read+0x13 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b106bd85 __do_page_fault+0x445 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b18015f5 page_fault+0x45 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
a041161e77950c5c [unknown] ([unknown])
```

With this patch applied, we get a bit further in unwinding:

```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking 
prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 
next_prio=120
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1670208 schedule+0x28 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b16737cc rwsem_down_read_failed+0xec 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1665e04 call_rwsem_down_read_failed+0x14 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1672a03 down_read+0x13 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b106bd85 __do_page_fault+0x445 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b18015f5 page_fault+0x45 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
7f38e463fa6d std::basic_streambuf 
>::sputn(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
>::_M_put(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::__write(std::ostreambuf_iterator >, 
char const*, int)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::num_put > 
>::_M_insert_float(std::ostreambuf_iterator
7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl>
7f38e464bd70 std::ostream& 
std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
563b9cb502f7 worker()+0xb7 
(/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
6eab825c1ee3e4ff [unknown] ([unknown])
```

Note that the backtrace is still stopping too early, when
compared to the nice results obtained via libunwind. It's
unclear so far what the reason for that is.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
---
 tools/perf/util/unwind-libdw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 6f318b15950e..5eff9bfc5758 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -45,13 +45,13 @@ static int __report_module(struct addr_location *al, u64 ip,
Dwarf_Addr s;
 
dwfl_module_info(mod, NULL,

[PATCH] perf util: take pgoff into account when reporting elf to libdwfl

2018-10-29 Thread Milian Wolff
ke_impl >, double>::_Async_state_impl(std::thread::_Invoker
563b9cb51149 
std::__invoke_result >, double>::_Async_state_impl(std::thread::_Invoker >>
563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) 
std::thread::_Invoker >, double>::_Async_state_>
563b9cb51149 
std::thread::_Invoker >, double>::_Async_state_impl(std::thread::_Invoker
563b9cb51149 
std::thread::_State_impl >, double>::_Async_state_impl(std::thread>
7f38e45f0062 execute_native_thread_routine+0x12 
(/usr/lib/libstdc++.so.6.0.25)
7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so)
7f38e42ccb22 __GI___clone+0x42 (inlined)
```

Before this patch, using libdwfl, you would see:

```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking 
prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 
next_prio=120
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1670208 schedule+0x28 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b16737cc rwsem_down_read_failed+0xec 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1665e04 call_rwsem_down_read_failed+0x14 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1672a03 down_read+0x13 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b106bd85 __do_page_fault+0x445 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b18015f5 page_fault+0x45 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
a041161e77950c5c [unknown] ([unknown])
```

With this patch applied, we get a bit further in unwinding:

```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking 
prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 
next_prio=120
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b166fec5 __sched_text_start+0x545 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1670208 schedule+0x28 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b16737cc rwsem_down_read_failed+0xec 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1665e04 call_rwsem_down_read_failed+0x14 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b1672a03 down_read+0x13 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b106bd85 __do_page_fault+0x445 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
b18015f5 page_fault+0x45 
(/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
7f38e463fa6d std::basic_streambuf 
>::sputn(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
>::_M_put(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::__write(std::ostreambuf_iterator >, 
char const*, int)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator 
> std::num_put > 
>::_M_insert_float(std::ostreambuf_iterator
7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl>
7f38e464bd70 std::ostream& 
std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
563b9cb502f7 worker()+0xb7 
(/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
6eab825c1ee3e4ff [unknown] ([unknown])
```

Note that the backtrace is still stopping too early, when
compared to the nice results obtained via libunwind. It's
unclear so far what the reason for that is.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
---
 tools/perf/util/unwind-libdw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 6f318b15950e..5eff9bfc5758 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -45,13 +45,13 @@ static int __report_module(struct addr_location *al, u64 ip,
Dwarf_Addr s;
 
dwfl_module_info(mod, NULL,

[tip:perf/urgent] perf script: Flush output stream after events in verbose mode

2018-10-26 Thread tip-bot for Milian Wolff
Commit-ID:  7ee40678af935fb489b0c6cf0f75808175214cd7
Gitweb: https://git.kernel.org/tip/7ee40678af935fb489b0c6cf0f75808175214cd7
Author: Milian Wolff 
AuthorDate: Sun, 21 Oct 2018 21:14:24 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 22 Oct 2018 14:27:11 -0300

perf script: Flush output stream after events in verbose mode

When the perf script output is written to a terminal stream, the normal
output of `perf script` would get buffered, but its debug output would
be written directly. This made it quite hard to figure out where a given
debug output is coming from.

We can improve on this by flushing the output buffer after processing an
event. To see the value, compare the following output for a `perf script
-v` run:

Before this patch:
```
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
... lots and lots of verbose debug output
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

After this patch:
```
...
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

This new output format makes it much easier to use perf script output
for debugging purposes, e.g. to investigate broken dwarf unwinding.

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20181021191424.16183-2-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index bd468b90801b..ca09b7d2adb7 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script,
 
if (PRINT_FIELD(METRIC))
perf_sample__fprint_metric(script, thread, evsel, sample, fp);
+
+   if (verbose)
+   fflush(fp);
 }
 
 static struct scripting_ops*scripting_ops;


[tip:perf/urgent] perf script: Flush output stream after events in verbose mode

2018-10-26 Thread tip-bot for Milian Wolff
Commit-ID:  7ee40678af935fb489b0c6cf0f75808175214cd7
Gitweb: https://git.kernel.org/tip/7ee40678af935fb489b0c6cf0f75808175214cd7
Author: Milian Wolff 
AuthorDate: Sun, 21 Oct 2018 21:14:24 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 22 Oct 2018 14:27:11 -0300

perf script: Flush output stream after events in verbose mode

When the perf script output is written to a terminal stream, the normal
output of `perf script` would get buffered, but its debug output would
be written directly. This made it quite hard to figure out where a given
debug output is coming from.

We can improve on this by flushing the output buffer after processing an
event. To see the value, compare the following output for a `perf script
-v` run:

Before this patch:
```
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
... lots and lots of verbose debug output
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

After this patch:
```
...
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

This new output format makes it much easier to use perf script output
for debugging purposes, e.g. to investigate broken dwarf unwinding.

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20181021191424.16183-2-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index bd468b90801b..ca09b7d2adb7 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script,
 
if (PRINT_FIELD(METRIC))
perf_sample__fprint_metric(script, thread, evsel, sample, fp);
+
+   if (verbose)
+   fflush(fp);
 }
 
 static struct scripting_ops*scripting_ops;


[tip:perf/urgent] perf script: Allow extended console debug output

2018-10-26 Thread tip-bot for Milian Wolff
Commit-ID:  c1c9b9695cc8868048f45c7e2559f65bc0be7382
Gitweb: https://git.kernel.org/tip/c1c9b9695cc8868048f45c7e2559f65bc0be7382
Author: Milian Wolff 
AuthorDate: Sun, 21 Oct 2018 21:14:23 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 22 Oct 2018 12:37:53 -0300

perf script: Allow extended console debug output

The script tool isn't using a browser, yet use_browser wasn't set
explicitly to zero. This in turn lead to confusing output such as:

  ```
  $ perf script -vvv ...
  ...
  overlapping maps in /home/milian/foobar (disable tui for more info)
  ...
  ```

Explicitly set use_browser to 0 now, which gives us the extended
debug information now in perf script as expected.

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Tested-by: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/20181021191424.16183-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4da5e32b9e03..bd468b90801b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv)
exit(-1);
}
 
-   if (!script_name)
+   if (!script_name) {
setup_pager();
+   use_browser = 0;
+   }
 
session = perf_session__new(, false, );
if (session == NULL)


[tip:perf/urgent] perf script: Allow extended console debug output

2018-10-26 Thread tip-bot for Milian Wolff
Commit-ID:  c1c9b9695cc8868048f45c7e2559f65bc0be7382
Gitweb: https://git.kernel.org/tip/c1c9b9695cc8868048f45c7e2559f65bc0be7382
Author: Milian Wolff 
AuthorDate: Sun, 21 Oct 2018 21:14:23 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 22 Oct 2018 12:37:53 -0300

perf script: Allow extended console debug output

The script tool isn't using a browser, yet use_browser wasn't set
explicitly to zero. This in turn lead to confusing output such as:

  ```
  $ perf script -vvv ...
  ...
  overlapping maps in /home/milian/foobar (disable tui for more info)
  ...
  ```

Explicitly set use_browser to 0 now, which gives us the extended
debug information now in perf script as expected.

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Tested-by: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/20181021191424.16183-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4da5e32b9e03..bd468b90801b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv)
exit(-1);
}
 
-   if (!script_name)
+   if (!script_name) {
setup_pager();
+   use_browser = 0;
+   }
 
session = perf_session__new(, false, );
if (session == NULL)


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-23 Thread Milian Wolff
On Dienstag, 23. Oktober 2018 06:03:56 CEST Andi Kleen wrote:
> > So what if my libm wasn't compiled with -fasynchronous-unwind-tables? We
> 
> It's default (64bit since always and 32bit now too) Unless someone disabled
> it.

Excellent, good to know. Since [1] doesn't explicitly disable it, I would 
assume the information should be available.

[1]: https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?
h=packages/glibc

> However libm might be partially written in assembler and hand written
> assembler often has problems with unwind tables because the programmer has
> to get them correct explicitely.

Yes, that could be the case. I'm unsure about the glibc build system and what 
actually gets compiled, but I found a potential source at [2]:

[2]: https://github.com/bminor/glibc/blob/
43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/ieee754/dbl-64/e_hypot.c

I believe this is what is used on my system, since I can spot calls to 
__issignaling@@GLIBC_2.18 etc. in the disassembly of __hypot_finite ([3]), 
which matches the sources referenced in [2].

[3]: https://paste.kde.org/poywa7y2z

If [2] is used, then it's not hand written assembler but code compiled by the 
compiler. So unwinding should work, even from the prologue? 

I have since also figured out how to dump the .eh_frame contents in a human 
readable format via readelf. Remember, __hypot_finite on my system is at 
offset 0x29660 of libm, so I think the following are the corresponding 
.eh_frame contents:

```
$ readelf --debug-dump=frames /usr/lib/libm.so.6 |& less
...
2b60 004c 2b64 FDE cie= 
pc=00029660..000299ce
  DW_CFA_advance_loc: 6 to 00029666
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r13 (r13) at cfa-16
  DW_CFA_advance_loc: 2 to 00029668
  DW_CFA_def_cfa_offset: 24
  DW_CFA_offset: r12 (r12) at cfa-24
  DW_CFA_advance_loc: 1 to 00029669
  DW_CFA_def_cfa_offset: 32
  DW_CFA_offset: r6 (rbp) at cfa-32
  DW_CFA_advance_loc: 6 to 0002966f
  DW_CFA_def_cfa_offset: 40
  DW_CFA_offset: r3 (rbx) at cfa-40
  DW_CFA_advance_loc: 29 to 0002968c
  DW_CFA_def_cfa_offset: 80
  DW_CFA_advance_loc2: 291 to 000297af
  DW_CFA_remember_state
  DW_CFA_def_cfa_offset: 40
  DW_CFA_advance_loc: 5 to 000297b4
  DW_CFA_def_cfa_offset: 32
  DW_CFA_advance_loc: 1 to 000297b5
  DW_CFA_def_cfa_offset: 24
  DW_CFA_advance_loc: 2 to 000297b7
  DW_CFA_def_cfa_offset: 16
  DW_CFA_advance_loc: 2 to 000297b9
  DW_CFA_def_cfa_offset: 8
  DW_CFA_advance_loc: 7 to 000297c0
  DW_CFA_restore_state
  DW_CFA_advance_loc1: 88 to 00029818
  DW_CFA_remember_state
  DW_CFA_def_cfa_offset: 40
  DW_CFA_advance_loc: 1 to 00029819
  DW_CFA_def_cfa_offset: 32
  DW_CFA_advance_loc: 1 to 0002981a
  DW_CFA_def_cfa_offset: 24
  DW_CFA_advance_loc: 2 to 0002981c
  DW_CFA_def_cfa_offset: 16
  DW_CFA_advance_loc: 2 to 0002981e
  DW_CFA_def_cfa_offset: 8
  DW_CFA_advance_loc: 18 to 00029830
  DW_CFA_restore_state
  DW_CFA_nop
```

I notice that this does not touch the rsp register at all, even though it's 
mutated by the code, leading to the issue. See again this paste for the 
disassembly at [3], and note that the broken sample frame points at 

0x00029688 <+40>:sub$0x28,%rsp

Can someone at least confirm whether unwinding from a function prologue via 
.eh_frame (but without .debug_frame) should actually be possible?

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-23 Thread Milian Wolff
On Dienstag, 23. Oktober 2018 06:03:56 CEST Andi Kleen wrote:
> > So what if my libm wasn't compiled with -fasynchronous-unwind-tables? We
> 
> It's default (64bit since always and 32bit now too) Unless someone disabled
> it.

Excellent, good to know. Since [1] doesn't explicitly disable it, I would 
assume the information should be available.

[1]: https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?
h=packages/glibc

> However libm might be partially written in assembler and hand written
> assembler often has problems with unwind tables because the programmer has
> to get them correct explicitely.

Yes, that could be the case. I'm unsure about the glibc build system and what 
actually gets compiled, but I found a potential source at [2]:

[2]: https://github.com/bminor/glibc/blob/
43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/ieee754/dbl-64/e_hypot.c

I believe this is what is used on my system, since I can spot calls to 
__issignaling@@GLIBC_2.18 etc. in the disassembly of __hypot_finite ([3]), 
which matches the sources referenced in [2].

[3]: https://paste.kde.org/poywa7y2z

If [2] is used, then it's not hand written assembler but code compiled by the 
compiler. So unwinding should work, even from the prologue? 

I have since also figured out how to dump the .eh_frame contents in a human 
readable format via readelf. Remember, __hypot_finite on my system is at 
offset 0x29660 of libm, so I think the following are the corresponding 
.eh_frame contents:

```
$ readelf --debug-dump=frames /usr/lib/libm.so.6 |& less
...
2b60 004c 2b64 FDE cie= 
pc=00029660..000299ce
  DW_CFA_advance_loc: 6 to 00029666
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r13 (r13) at cfa-16
  DW_CFA_advance_loc: 2 to 00029668
  DW_CFA_def_cfa_offset: 24
  DW_CFA_offset: r12 (r12) at cfa-24
  DW_CFA_advance_loc: 1 to 00029669
  DW_CFA_def_cfa_offset: 32
  DW_CFA_offset: r6 (rbp) at cfa-32
  DW_CFA_advance_loc: 6 to 0002966f
  DW_CFA_def_cfa_offset: 40
  DW_CFA_offset: r3 (rbx) at cfa-40
  DW_CFA_advance_loc: 29 to 0002968c
  DW_CFA_def_cfa_offset: 80
  DW_CFA_advance_loc2: 291 to 000297af
  DW_CFA_remember_state
  DW_CFA_def_cfa_offset: 40
  DW_CFA_advance_loc: 5 to 000297b4
  DW_CFA_def_cfa_offset: 32
  DW_CFA_advance_loc: 1 to 000297b5
  DW_CFA_def_cfa_offset: 24
  DW_CFA_advance_loc: 2 to 000297b7
  DW_CFA_def_cfa_offset: 16
  DW_CFA_advance_loc: 2 to 000297b9
  DW_CFA_def_cfa_offset: 8
  DW_CFA_advance_loc: 7 to 000297c0
  DW_CFA_restore_state
  DW_CFA_advance_loc1: 88 to 00029818
  DW_CFA_remember_state
  DW_CFA_def_cfa_offset: 40
  DW_CFA_advance_loc: 1 to 00029819
  DW_CFA_def_cfa_offset: 32
  DW_CFA_advance_loc: 1 to 0002981a
  DW_CFA_def_cfa_offset: 24
  DW_CFA_advance_loc: 2 to 0002981c
  DW_CFA_def_cfa_offset: 16
  DW_CFA_advance_loc: 2 to 0002981e
  DW_CFA_def_cfa_offset: 8
  DW_CFA_advance_loc: 18 to 00029830
  DW_CFA_restore_state
  DW_CFA_nop
```

I notice that this does not touch the rsp register at all, even though it's 
mutated by the code, leading to the issue. See again this paste for the 
disassembly at [3], and note that the broken sample frame points at 

0x00029688 <+40>:sub$0x28,%rsp

Can someone at least confirm whether unwinding from a function prologue via 
.eh_frame (but without .debug_frame) should actually be possible?

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 15:58:17 CEST Andi Kleen wrote:
> Milian Wolff  writes:
> > After more digging, it turns out that I've apparently chased a red
> > herring.
> > I'm running archlinux which isn't shipping debug symbols for libm.
> 
> 64bit executables normally have unwind information even when stripped.
> Unless someone forcefully stripped those too.
> 
> You can checkout with objdump --sections.

Right, we do have .eh_frame and .eh_frame_hdr according to readelf:

```
$ readelf  --sections /usr/lib/libm.so.6
There are 26 section headers, starting at offset 0x183120:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  [ 0]   NULL   
        0 0 0
  [ 1] .note.gnu.build-i NOTE 0270  0270
   0024     A   0 0 4
  [ 2] .note.ABI-tag NOTE 0294  0294
   0020     A   0 0 4
  [ 3] .note.gnu.propert NOTE 02b8  02b8
   0020     A   0 0 8
  [ 4] .gnu.hash GNU_HASH 02d8  02d8
   24d0     A   5 0 8
  [ 5] .dynsym   DYNSYM   27a8  27a8
   66c0  0018   A   6 1 8
  [ 6] .dynstr   STRTAB   8e68  8e68
   2352     A   0 0 1
  [ 7] .gnu.version  VERSYM   b1ba  b1ba
   0890  0002   A   5 0 2
  [ 8] .gnu.version_dVERDEF   ba50  ba50
   017c     A   611 8
  [ 9] .gnu.version_rVERNEED  bbd0  bbd0
   0060     A   6 2 8
  [10] .rela.dyn RELA bc30  bc30
   0480  0018   A   5 0 8
  [11] .init PROGBITS d000  d000
   001b    AX   0 0 4
  [12] .text PROGBITS d020  d020
   000a063b    AX   0 0 16
  [13] .fini PROGBITS 000ad65c  000ad65c
   000d    AX   0 0 4
  [14] .rodata   PROGBITS 000ae000  000ae000
   000c76a4     A   0 0 32
  [15] .eh_frame_hdr PROGBITS 001756a4  001756a4
   1c34     A   0 0 4
  [16] .eh_frame PROGBITS 001772d8  001772d8
   93f0     A   0 0 8
  [17] .hash HASH 001806c8  001806c8
   210c  0004   A   5 0 8
  [18] .init_array   INIT_ARRAY   00183c80  00182c80
   0008  0008  WA   0 0 8
  [19] .fini_array   FINI_ARRAY   00183c88  00182c88
   0008  0008  WA   0 0 8
  [20] .dynamic  DYNAMIC  00183c90  00182c90
   01f0  0010  WA   6 0 8
  [21] .got  PROGBITS 00183e80  00182e80
   0180  0008  WA   0 0 8
  [22] .data PROGBITS 00184000  00183000
   000c    WA   0 0 8
  [23] .bss  NOBITS   0018400c  0018300c
   000c    WA   0 0 4
  [24] .comment  PROGBITS   0018300c
   001a  0001  MS   0 0 1
  [25] .shstrtab STRTAB     00183026
   00fa     0 0 1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
```

But should that be enough information to be able to unwind from a function 
prologue? I mean, it obviously seems to work when we unwind from the function 
body. But how would I know whether it should work from the prologue too?

Reading e.g. https://www.airs.com/blog/archives/460, I can find:

> There should be exactly one FDE covering each instruction which may be being 
executed when an exception occurs. By default an exception can only o

Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 15:58:17 CEST Andi Kleen wrote:
> Milian Wolff  writes:
> > After more digging, it turns out that I've apparently chased a red
> > herring.
> > I'm running archlinux which isn't shipping debug symbols for libm.
> 
> 64bit executables normally have unwind information even when stripped.
> Unless someone forcefully stripped those too.
> 
> You can checkout with objdump --sections.

Right, we do have .eh_frame and .eh_frame_hdr according to readelf:

```
$ readelf  --sections /usr/lib/libm.so.6
There are 26 section headers, starting at offset 0x183120:

Section Headers:
  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  [ 0]   NULL   
        0 0 0
  [ 1] .note.gnu.build-i NOTE 0270  0270
   0024     A   0 0 4
  [ 2] .note.ABI-tag NOTE 0294  0294
   0020     A   0 0 4
  [ 3] .note.gnu.propert NOTE 02b8  02b8
   0020     A   0 0 8
  [ 4] .gnu.hash GNU_HASH 02d8  02d8
   24d0     A   5 0 8
  [ 5] .dynsym   DYNSYM   27a8  27a8
   66c0  0018   A   6 1 8
  [ 6] .dynstr   STRTAB   8e68  8e68
   2352     A   0 0 1
  [ 7] .gnu.version  VERSYM   b1ba  b1ba
   0890  0002   A   5 0 2
  [ 8] .gnu.version_dVERDEF   ba50  ba50
   017c     A   611 8
  [ 9] .gnu.version_rVERNEED  bbd0  bbd0
   0060     A   6 2 8
  [10] .rela.dyn RELA bc30  bc30
   0480  0018   A   5 0 8
  [11] .init PROGBITS d000  d000
   001b    AX   0 0 4
  [12] .text PROGBITS d020  d020
   000a063b    AX   0 0 16
  [13] .fini PROGBITS 000ad65c  000ad65c
   000d    AX   0 0 4
  [14] .rodata   PROGBITS 000ae000  000ae000
   000c76a4     A   0 0 32
  [15] .eh_frame_hdr PROGBITS 001756a4  001756a4
   1c34     A   0 0 4
  [16] .eh_frame PROGBITS 001772d8  001772d8
   93f0     A   0 0 8
  [17] .hash HASH 001806c8  001806c8
   210c  0004   A   5 0 8
  [18] .init_array   INIT_ARRAY   00183c80  00182c80
   0008  0008  WA   0 0 8
  [19] .fini_array   FINI_ARRAY   00183c88  00182c88
   0008  0008  WA   0 0 8
  [20] .dynamic  DYNAMIC  00183c90  00182c90
   01f0  0010  WA   6 0 8
  [21] .got  PROGBITS 00183e80  00182e80
   0180  0008  WA   0 0 8
  [22] .data PROGBITS 00184000  00183000
   000c    WA   0 0 8
  [23] .bss  NOBITS   0018400c  0018300c
   000c    WA   0 0 4
  [24] .comment  PROGBITS   0018300c
   001a  0001  MS   0 0 1
  [25] .shstrtab STRTAB     00183026
   00fa     0 0 1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
```

But should that be enough information to be able to unwind from a function 
prologue? I mean, it obviously seems to work when we unwind from the function 
body. But how would I know whether it should work from the prologue too?

Reading e.g. https://www.airs.com/blog/archives/460, I can find:

> There should be exactly one FDE covering each instruction which may be being 
executed when an exception occurs. By default an exception can only o

Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 12:35:39 CEST Milian Wolff wrote:
> On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> > Hey all,
> > 
> > I'm on the quest to figure out why perf regularly fails to unwind (some)
> > samples. I am seeing very strange behavior, where an apparently wrong
> > stack
> > pointer value is read from the register - see below for more information
> > and the end of this (long) mail for my open questions. Any help would be
> > greatly appreciated.
> > 
> > I am currently using this trivial C++ code to reproduce the issue:
> > 
> > ```
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > using namespace std;
> > 
> > int main()
> > {
> > 
> > uniform_real_distribution uniform(-1E5, 1E5);
> > default_random_engine engine;
> > double s = 0;
> > for (int i = 0; i < 1000; ++i) {
> > 
> > s += norm(complex(uniform(engine), uniform(engine)));
> > 
> > }
> > cout << s << '\n';
> > return 0;
> > 
> > }
> > ```
> > 
> > I compile it with `g++ -O2 -g` and then record it with `perf record
> > --call-
> > graph dwarf`. Using perf script, I then see e.g.:
> > 
> > ```
> > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns
> > ...
> > # first frame (working unwinding from __hypot_finite):
> > unwind: reg 16, val 7faf7dca2696
> > unwind: reg 7, val 7ffc80811ca0
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184
> > unwind: find_proc_info dso /usr/lib/libc-2.28.so
> > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696)
> > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
> > unwind: main:ip = 0x56382b0fc128 (0x1128)
> > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> > unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> > # second frame (unrelated)
> > unwind: reg 16, val 56382b0fc114
> > unwind: reg 7, val 7ffc80811d10
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72
> > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264
> > unwind: main:ip = 0x56382b0fc114 (0x1114)
> > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> > unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> > # third frame (broken unwinding from __hypot_finite)
> > unwind: reg 16, val 7faf7dca2688
> > unwind: reg 7, val 7ffc80811ca0
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32
> > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
> > unwind: '':ip = 0x (0x0)
> > 
> > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp:
> > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
> > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
> > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
> > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp:
> > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
> > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp:
> > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
> 

Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 12:35:39 CEST Milian Wolff wrote:
> On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> > Hey all,
> > 
> > I'm on the quest to figure out why perf regularly fails to unwind (some)
> > samples. I am seeing very strange behavior, where an apparently wrong
> > stack
> > pointer value is read from the register - see below for more information
> > and the end of this (long) mail for my open questions. Any help would be
> > greatly appreciated.
> > 
> > I am currently using this trivial C++ code to reproduce the issue:
> > 
> > ```
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > using namespace std;
> > 
> > int main()
> > {
> > 
> > uniform_real_distribution uniform(-1E5, 1E5);
> > default_random_engine engine;
> > double s = 0;
> > for (int i = 0; i < 1000; ++i) {
> > 
> > s += norm(complex(uniform(engine), uniform(engine)));
> > 
> > }
> > cout << s << '\n';
> > return 0;
> > 
> > }
> > ```
> > 
> > I compile it with `g++ -O2 -g` and then record it with `perf record
> > --call-
> > graph dwarf`. Using perf script, I then see e.g.:
> > 
> > ```
> > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns
> > ...
> > # first frame (working unwinding from __hypot_finite):
> > unwind: reg 16, val 7faf7dca2696
> > unwind: reg 7, val 7ffc80811ca0
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184
> > unwind: find_proc_info dso /usr/lib/libc-2.28.so
> > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696)
> > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
> > unwind: main:ip = 0x56382b0fc128 (0x1128)
> > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> > unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> > # second frame (unrelated)
> > unwind: reg 16, val 56382b0fc114
> > unwind: reg 7, val 7ffc80811d10
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72
> > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264
> > unwind: main:ip = 0x56382b0fc114 (0x1114)
> > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> > unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> > # third frame (broken unwinding from __hypot_finite)
> > unwind: reg 16, val 7faf7dca2688
> > unwind: reg 7, val 7ffc80811ca0
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32
> > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
> > unwind: '':ip = 0x (0x0)
> > 
> > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp:
> > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
> > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
> > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
> > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp:
> > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
> > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> > 
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> > 
> > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp:
> > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
> 

Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 12:16:18 CEST Jiri Olsa wrote:
> On Mon, Oct 22, 2018 at 12:09:22PM +0200, Milian Wolff wrote:
> > On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote:
> > > On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote:
> > > > When the perf script output is written to a terminal stream,
> > > > the normal output of `perf script` would get buffered, but its
> > > > debug output would be written directly. This made it quite hard
> > > > to figure out where a given debug output is coming from. We can
> > > > improve on this by flushing the output buffer after processing an
> > > > event. To see the value, compare the following output for a
> > > > `perf script -v` run:
> > > > 
> > > > Before this patch:
> > > > ```
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > ... lots and lots of verbose debug output
> > > > 
> > > > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > ...
> > > > ```
> > > > 
> > > > After this patch:
> > > > ```
> > > > ...
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > 
> > > > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > 
> > > > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > ...
> > > > ```
> > > > 
> > > > This new output format makes it much easier to use perf script
> > > > output for debugging purposes, e.g. to investigate broken dwarf
> > > > unwinding.
> > > 
> > > yep, I plan to check on this ;-)
> > > 
> > > > Signed-off-by: Milian Wolff 
> > > > Cc: Arnaldo Carvalho de Melo 
> > > > ---
> > > > 
> > > >  tools/perf/builtin-script.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > > > index bd468b90801b..ca09b7d2adb7 100644
> > > > --- a/tools/perf/builtin-script.c
> > > > +++ b/tools/perf/builtin-script.c
> > > > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script
> > > > *script,
> > > > 
> > > > if (PRINT_FIELD(METRIC))
> > > > 
> > > > perf_sample__fprint_metric(script, thread, evsel, 
sample, fp);
> > > > 
> > > > +
> > > > +   if (verbose)
> > > > +   fflush(fp);
> > > 
> > > should we call fflush(NULL) to dump all the streams?
> > > 
> > > the verbose goes to stderr and fp seems to be stdout byt default
> > 
> > stderr isn't buffered, so we don't need to flush it. So personally, I
> > don't
> > see a need to dump all streams - fp should be enough? Can you maybe
> > explain
> > where it would be required to flush more buffers?
> 
> hum, did not know stderr wasn't buffer
>
> I think there's perf script feature to store the events data to
> separate files per each event.. but I guess we don't need to
> flush them.. we just need to have stdout and stderr in sync IIUC

Exactly, and that's achieved with this patch form what I see :) Or should we 
maybe instead call 

setbuf(fp, NULL);

in verbose mode?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 12:16:18 CEST Jiri Olsa wrote:
> On Mon, Oct 22, 2018 at 12:09:22PM +0200, Milian Wolff wrote:
> > On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote:
> > > On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote:
> > > > When the perf script output is written to a terminal stream,
> > > > the normal output of `perf script` would get buffered, but its
> > > > debug output would be written directly. This made it quite hard
> > > > to figure out where a given debug output is coming from. We can
> > > > improve on this by flushing the output buffer after processing an
> > > > event. To see the value, compare the following output for a
> > > > `perf script -v` run:
> > > > 
> > > > Before this patch:
> > > > ```
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > ... lots and lots of verbose debug output
> > > > 
> > > > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > ...
> > > > ```
> > > > 
> > > > After this patch:
> > > > ```
> > > > ...
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > 
> > > > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > unwind: reg 16, val 7faf7dfdc000
> > > > unwind: reg 7, val 7ffc80811e30
> > > > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > > > unwind: reg 6, val 0
> > > > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > > > 
> > > > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > > > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > > > 
> > > > ...
> > > > ```
> > > > 
> > > > This new output format makes it much easier to use perf script
> > > > output for debugging purposes, e.g. to investigate broken dwarf
> > > > unwinding.
> > > 
> > > yep, I plan to check on this ;-)
> > > 
> > > > Signed-off-by: Milian Wolff 
> > > > Cc: Arnaldo Carvalho de Melo 
> > > > ---
> > > > 
> > > >  tools/perf/builtin-script.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > > > index bd468b90801b..ca09b7d2adb7 100644
> > > > --- a/tools/perf/builtin-script.c
> > > > +++ b/tools/perf/builtin-script.c
> > > > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script
> > > > *script,
> > > > 
> > > > if (PRINT_FIELD(METRIC))
> > > > 
> > > > perf_sample__fprint_metric(script, thread, evsel, 
sample, fp);
> > > > 
> > > > +
> > > > +   if (verbose)
> > > > +   fflush(fp);
> > > 
> > > should we call fflush(NULL) to dump all the streams?
> > > 
> > > the verbose goes to stderr and fp seems to be stdout byt default
> > 
> > stderr isn't buffered, so we don't need to flush it. So personally, I
> > don't
> > see a need to dump all streams - fp should be enough? Can you maybe
> > explain
> > where it would be required to flush more buffers?
> 
> hum, did not know stderr wasn't buffer
>
> I think there's perf script feature to store the events data to
> separate files per each event.. but I guess we don't need to
> flush them.. we just need to have stdout and stderr in sync IIUC

Exactly, and that's achieved with this patch form what I see :) Or should we 
maybe instead call 

setbuf(fp, NULL);

in verbose mode?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-22 Thread Milian Wolff
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> Hey all,
> 
> I'm on the quest to figure out why perf regularly fails to unwind (some)
> samples. I am seeing very strange behavior, where an apparently wrong stack
> pointer value is read from the register - see below for more information and
> the end of this (long) mail for my open questions. Any help would be
> greatly appreciated.
> 
> I am currently using this trivial C++ code to reproduce the issue:
> 
> ```
> #include 
> #include 
> #include 
> #include 
> 
> using namespace std;
> 
> int main()
> {
> uniform_real_distribution uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 1000; ++i) {
> s += norm(complex(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ```
> 
> I compile it with `g++ -O2 -g` and then record it with `perf record --call-
> graph dwarf`. Using perf script, I then see e.g.:
> 
> ```
> $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns
> ...
> # first frame (working unwinding from __hypot_finite):
> unwind: reg 16, val 7faf7dca2696
> unwind: reg 7, val 7ffc80811ca0
> unwind: find_proc_info dso /usr/lib/libm-2.28.so
> unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72
> unwind: find_proc_info dso /usr/lib/libm-2.28.so
> unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104
> unwind: find_proc_info dso
> /home/milian/projects/kdab/rnd/hotspot/build/tests/
> test-clients/cpp-inlining/cpp-inlining
> unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184
> unwind: find_proc_info dso /usr/lib/libc-2.28.so
> unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376
> unwind: find_proc_info dso
> /home/milian/projects/kdab/rnd/hotspot/build/tests/
> test-clients/cpp-inlining/cpp-inlining
> unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696)
> unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
> unwind: main:ip = 0x56382b0fc128 (0x1128)
> unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> # second frame (unrelated)
> unwind: reg 16, val 56382b0fc114
> unwind: reg 7, val 7ffc80811d10
> unwind: find_proc_info dso
> /home/milian/projects/kdab/rnd/hotspot/build/tests/
> test-clients/cpp-inlining/cpp-inlining
> unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72
> unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264
> unwind: main:ip = 0x56382b0fc114 (0x1114)
> unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> # third frame (broken unwinding from __hypot_finite)
> unwind: reg 16, val 7faf7dca2688
> unwind: reg 7, val 7ffc80811ca0
> unwind: find_proc_info dso /usr/lib/libm-2.28.so
> unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32
> unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
> unwind: '':ip = 0x (0x0)
> cpp-inlining 24617 90229.126685606: 711026 cycles:uppp:
>   7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
>   7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
>   56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
>   7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
>   56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
> 
> cpp-inlining 24617 90229.126921551: 714657 cycles:uppp:
>   56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
>   7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
>   56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
> 
> cpp-inlining 24617 90229.127157818: 719976 cycles:uppp:
>   7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
>    [unknown] ([unknown])
> ...
> ```
> 
> Now I'm trying to figure out why one __hypot_finite sample works but the
> other one breaks for no apparent reason.

I've now collected some more background information, which is quite helpful I 
believe for the analysis of this issue:

Note how the broken sample has the IP pointing at __hypot_finite+0x28:

unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)

When we run my reproducer code in GDB, we can see that obtaining a backtrace 
from that address works just fine there:

```
$ gdb ./cpp-inlining
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPL

Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-22 Thread Milian Wolff
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> Hey all,
> 
> I'm on the quest to figure out why perf regularly fails to unwind (some)
> samples. I am seeing very strange behavior, where an apparently wrong stack
> pointer value is read from the register - see below for more information and
> the end of this (long) mail for my open questions. Any help would be
> greatly appreciated.
> 
> I am currently using this trivial C++ code to reproduce the issue:
> 
> ```
> #include 
> #include 
> #include 
> #include 
> 
> using namespace std;
> 
> int main()
> {
> uniform_real_distribution uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 1000; ++i) {
> s += norm(complex(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ```
> 
> I compile it with `g++ -O2 -g` and then record it with `perf record --call-
> graph dwarf`. Using perf script, I then see e.g.:
> 
> ```
> $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns
> ...
> # first frame (working unwinding from __hypot_finite):
> unwind: reg 16, val 7faf7dca2696
> unwind: reg 7, val 7ffc80811ca0
> unwind: find_proc_info dso /usr/lib/libm-2.28.so
> unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72
> unwind: find_proc_info dso /usr/lib/libm-2.28.so
> unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104
> unwind: find_proc_info dso
> /home/milian/projects/kdab/rnd/hotspot/build/tests/
> test-clients/cpp-inlining/cpp-inlining
> unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184
> unwind: find_proc_info dso /usr/lib/libc-2.28.so
> unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376
> unwind: find_proc_info dso
> /home/milian/projects/kdab/rnd/hotspot/build/tests/
> test-clients/cpp-inlining/cpp-inlining
> unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696)
> unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
> unwind: main:ip = 0x56382b0fc128 (0x1128)
> unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> # second frame (unrelated)
> unwind: reg 16, val 56382b0fc114
> unwind: reg 7, val 7ffc80811d10
> unwind: find_proc_info dso
> /home/milian/projects/kdab/rnd/hotspot/build/tests/
> test-clients/cpp-inlining/cpp-inlining
> unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72
> unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264
> unwind: main:ip = 0x56382b0fc114 (0x1114)
> unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> # third frame (broken unwinding from __hypot_finite)
> unwind: reg 16, val 7faf7dca2688
> unwind: reg 7, val 7ffc80811ca0
> unwind: find_proc_info dso /usr/lib/libm-2.28.so
> unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32
> unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
> unwind: '':ip = 0x (0x0)
> cpp-inlining 24617 90229.126685606: 711026 cycles:uppp:
>   7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
>   7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
>   56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
>   7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
>   56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
> 
> cpp-inlining 24617 90229.126921551: 714657 cycles:uppp:
>   56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
>   7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
>   56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> build/tests/test-clients/cpp-inlining/cpp-inlining)
> 
> cpp-inlining 24617 90229.127157818: 719976 cycles:uppp:
>   7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
>    [unknown] ([unknown])
> ...
> ```
> 
> Now I'm trying to figure out why one __hypot_finite sample works but the
> other one breaks for no apparent reason.

I've now collected some more background information, which is quite helpful I 
believe for the analysis of this issue:

Note how the broken sample has the IP pointing at __hypot_finite+0x28:

unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)

When we run my reproducer code in GDB, we can see that obtaining a backtrace 
from that address works just fine there:

```
$ gdb ./cpp-inlining
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPL

Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote:
> On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote:
> > When the perf script output is written to a terminal stream,
> > the normal output of `perf script` would get buffered, but its
> > debug output would be written directly. This made it quite hard
> > to figure out where a given debug output is coming from. We can
> > improve on this by flushing the output buffer after processing an
> > event. To see the value, compare the following output for a
> > `perf script -v` run:
> > 
> > Before this patch:
> > ```
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > ... lots and lots of verbose debug output
> > 
> > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > ...
> > ```
> > 
> > After this patch:
> > ```
> > ...
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > 
> > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > 
> > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > ...
> > ```
> > 
> > This new output format makes it much easier to use perf script
> > output for debugging purposes, e.g. to investigate broken dwarf
> > unwinding.
> 
> yep, I plan to check on this ;-)
> 
> > Signed-off-by: Milian Wolff 
> > Cc: Arnaldo Carvalho de Melo 
> > ---
> > 
> >  tools/perf/builtin-script.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index bd468b90801b..ca09b7d2adb7 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script
> > *script,
> > 
> > if (PRINT_FIELD(METRIC))
> > 
> > perf_sample__fprint_metric(script, thread, evsel, sample, fp);
> > 
> > +
> > +   if (verbose)
> > +   fflush(fp);
> 
> should we call fflush(NULL) to dump all the streams?
> 
> the verbose goes to stderr and fp seems to be stdout byt default

stderr isn't buffered, so we don't need to flush it. So personally, I don't 
see a need to dump all streams - fp should be enough? Can you maybe explain 
where it would be required to flush more buffers?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/2] perf script: flush output stream after events in verbose mode

2018-10-22 Thread Milian Wolff
On Montag, 22. Oktober 2018 11:43:17 CEST Jiri Olsa wrote:
> On Sun, Oct 21, 2018 at 09:14:24PM +0200, Milian Wolff wrote:
> > When the perf script output is written to a terminal stream,
> > the normal output of `perf script` would get buffered, but its
> > debug output would be written directly. This made it quite hard
> > to figure out where a given debug output is coming from. We can
> > improve on this by flushing the output buffer after processing an
> > event. To see the value, compare the following output for a
> > `perf script -v` run:
> > 
> > Before this patch:
> > ```
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > ... lots and lots of verbose debug output
> > 
> > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > ...
> > ```
> > 
> > After this patch:
> > ```
> > ...
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > 
> > cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > unwind: reg 16, val 7faf7dfdc000
> > unwind: reg 7, val 7ffc80811e30
> > unwind: find_proc_info dso /usr/lib/ld-2.28.so
> > unwind: reg 6, val 0
> > unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
> > 
> > cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
> > 7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
> > 
> > ...
> > ```
> > 
> > This new output format makes it much easier to use perf script
> > output for debugging purposes, e.g. to investigate broken dwarf
> > unwinding.
> 
> yep, I plan to check on this ;-)
> 
> > Signed-off-by: Milian Wolff 
> > Cc: Arnaldo Carvalho de Melo 
> > ---
> > 
> >  tools/perf/builtin-script.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index bd468b90801b..ca09b7d2adb7 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -1737,6 +1737,9 @@ static void process_event(struct perf_script
> > *script,
> > 
> > if (PRINT_FIELD(METRIC))
> > 
> > perf_sample__fprint_metric(script, thread, evsel, sample, fp);
> > 
> > +
> > +   if (verbose)
> > +   fflush(fp);
> 
> should we call fflush(NULL) to dump all the streams?
> 
> the verbose goes to stderr and fp seems to be stdout byt default

stderr isn't buffered, so we don't need to flush it. So personally, I don't 
see a need to dump all streams - fp should be enough? Can you maybe explain 
where it would be required to flush more buffers?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-21 Thread Milian Wolff
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> Hey all,
> 
> I'm on the quest to figure out why perf regularly fails to unwind (some)
> samples. I am seeing very strange behavior, where an apparently wrong stack
> pointer value is read from the register - see below for more information and
> the end of this (long) mail for my open questions. Any help would be
> greatly appreciated.
> 
> I am currently using this trivial C++ code to reproduce the issue:
> 
> ```
> #include 
> #include 
> #include 
> #include 
> 
> using namespace std;
> 
> int main()
> {
> uniform_real_distribution uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 1000; ++i) {
> s += norm(complex(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ```
> 
> I compile it with `g++ -O2 -g` and then record it with `perf record --call-
> graph dwarf`. Using perf script, I then see e.g.:

With my patch to regularly flush the perf script output buffer, we can now 
easily find all broken backtraces and the corresponding debug output via:

$ perf script --ns -v |& awk -v RS='' '/\[unknown\]/ {print "\n"$0}'

I've pasted the output to the above command from my machine here:
https://paste.kde.org/pmyxwkk1k

This contains 139 samples with broken unwinding, out of 2350 samples in total, 
so about 6% of all samples are broken.

In many cases, the first accessed memory is 0 because a too-low offset into 
the stack is computed from the SP value, similar to the scenario I described 
in my initial mail. In other cases we read garbadge addresses such as 

unwind: access_mem addr 0x7ffc80811cf0 val 408195dfbda90580, offset 24

In all cases except for the the two samples at the very start and end of this 
log, the last offset encountered in access_mem is lower than 72. Remember what 
I wrote in the initial mail - if I manually hack the access_mem function to 
use 72 for one of the broken samples, it made unwinding magically work 
again...

With this addition of data - can anyone sched some light on what's potentially 
going on here? How can we improve this situation?

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: Broken dwarf unwinding - wrong stack pointer register value?

2018-10-21 Thread Milian Wolff
On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> Hey all,
> 
> I'm on the quest to figure out why perf regularly fails to unwind (some)
> samples. I am seeing very strange behavior, where an apparently wrong stack
> pointer value is read from the register - see below for more information and
> the end of this (long) mail for my open questions. Any help would be
> greatly appreciated.
> 
> I am currently using this trivial C++ code to reproduce the issue:
> 
> ```
> #include 
> #include 
> #include 
> #include 
> 
> using namespace std;
> 
> int main()
> {
> uniform_real_distribution uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 1000; ++i) {
> s += norm(complex(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ```
> 
> I compile it with `g++ -O2 -g` and then record it with `perf record --call-
> graph dwarf`. Using perf script, I then see e.g.:

With my patch to regularly flush the perf script output buffer, we can now 
easily find all broken backtraces and the corresponding debug output via:

$ perf script --ns -v |& awk -v RS='' '/\[unknown\]/ {print "\n"$0}'

I've pasted the output to the above command from my machine here:
https://paste.kde.org/pmyxwkk1k

This contains 139 samples with broken unwinding, out of 2350 samples in total, 
so about 6% of all samples are broken.

In many cases, the first accessed memory is 0 because a too-low offset into 
the stack is computed from the SP value, similar to the scenario I described 
in my initial mail. In other cases we read garbadge addresses such as 

unwind: access_mem addr 0x7ffc80811cf0 val 408195dfbda90580, offset 24

In all cases except for the the two samples at the very start and end of this 
log, the last offset encountered in access_mem is lower than 72. Remember what 
I wrote in the initial mail - if I manually hack the access_mem function to 
use 72 for one of the broken samples, it made unwinding magically work 
again...

With this addition of data - can anyone sched some light on what's potentially 
going on here? How can we improve this situation?

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[PATCH 2/2] perf script: flush output stream after events in verbose mode

2018-10-21 Thread Milian Wolff
When the perf script output is written to a terminal stream,
the normal output of `perf script` would get buffered, but its
debug output would be written directly. This made it quite hard
to figure out where a given debug output is coming from. We can
improve on this by flushing the output buffer after processing an
event. To see the value, compare the following output for a
`perf script -v` run:

Before this patch:
```
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
... lots and lots of verbose debug output
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

After this patch:
```
...
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

This new output format makes it much easier to use perf script
output for debugging purposes, e.g. to investigate broken dwarf
unwinding.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index bd468b90801b..ca09b7d2adb7 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script,
 
if (PRINT_FIELD(METRIC))
perf_sample__fprint_metric(script, thread, evsel, sample, fp);
+
+   if (verbose)
+   fflush(fp);
 }
 
 static struct scripting_ops*scripting_ops;
-- 
2.19.1


[PATCH 1/2] perf script: allow extended console debug output

2018-10-21 Thread Milian Wolff
The script tool isn't using a browser, yet use_browser
wasn't set explicitly to zero. This in turn lead to confusing
output such as:

```
$ perf script -vvv ...
...
overlapping maps in /home/milian/foobar (disable tui for more info)
...
```

Explicitly set use_browser to 0 now, which gives us the extended
debug information now in perf script as expected.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4da5e32b9e03..bd468b90801b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv)
exit(-1);
}
 
-   if (!script_name)
+   if (!script_name) {
setup_pager();
+   use_browser = 0;
+   }
 
session = perf_session__new(, false, );
if (session == NULL)
-- 
2.19.1


[PATCH 2/2] perf script: flush output stream after events in verbose mode

2018-10-21 Thread Milian Wolff
When the perf script output is written to a terminal stream,
the normal output of `perf script` would get buffered, but its
debug output would be written directly. This made it quite hard
to figure out where a given debug output is coming from. We can
improve on this by flushing the output buffer after processing an
event. To see the value, compare the following output for a
`perf script -v` run:

Before this patch:
```
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
... lots and lots of verbose debug output
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

After this patch:
```
...
unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122036534:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)

unwind: reg 16, val 7faf7dfdc000
unwind: reg 7, val 7ffc80811e30
unwind: find_proc_info dso /usr/lib/ld-2.28.so
unwind: reg 6, val 0
unwind: _start:ip = 0x7faf7dfdc000 (0x2000)
cpp-inlining 24617 90229.122043974:  1 cycles:uppp:
7faf7dfdc000 _start+0x0 (/usr/lib/ld-2.28.so)
...
```

This new output format makes it much easier to use perf script
output for debugging purposes, e.g. to investigate broken dwarf
unwinding.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index bd468b90801b..ca09b7d2adb7 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1737,6 +1737,9 @@ static void process_event(struct perf_script *script,
 
if (PRINT_FIELD(METRIC))
perf_sample__fprint_metric(script, thread, evsel, sample, fp);
+
+   if (verbose)
+   fflush(fp);
 }
 
 static struct scripting_ops*scripting_ops;
-- 
2.19.1


[PATCH 1/2] perf script: allow extended console debug output

2018-10-21 Thread Milian Wolff
The script tool isn't using a browser, yet use_browser
wasn't set explicitly to zero. This in turn lead to confusing
output such as:

```
$ perf script -vvv ...
...
overlapping maps in /home/milian/foobar (disable tui for more info)
...
```

Explicitly set use_browser to 0 now, which gives us the extended
debug information now in perf script as expected.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4da5e32b9e03..bd468b90801b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3417,8 +3417,10 @@ int cmd_script(int argc, const char **argv)
exit(-1);
}
 
-   if (!script_name)
+   if (!script_name) {
setup_pager();
+   use_browser = 0;
+   }
 
session = perf_session__new(, false, );
if (session == NULL)
-- 
2.19.1


Broken dwarf unwinding - wrong stack pointer register value?

2018-10-20 Thread Milian Wolff
eaningful value...

This offset is calculcated from LIBUNWIND__ARCH_REG_SP, cf. unwind-libunwind-
local.c. So is the stack pointer address in the register wrong? If I hackishly 
offset the value for the stack pointer by 40, i.e.:

```
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/
unwind-libunwind-local.c
index 79f521a552cf..a766ddaaa4dd 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -502,6 +502,7 @@ static int access_mem(unw_addr_space_t __maybe_unused as,
if (ret)
return ret;
 
+   start -= 40;
end = start + stack->size;
 
/* Check overflow. */
```

Then I can successfully unwind the broken sample:

```
$ perf script -v --no-inline --time 90229.127156,90229.127158 --ns
...
unwind: reg 16, val 7faf7dca2688
unwind: reg 7, val 7ffc80811ca0
unwind: find_proc_info dso /usr/lib/libm-2.28.so
unwind: access_mem addr 0x7ffc80811cc0 val 7faf7dc88af9, offset 72
unwind: find_proc_info dso /usr/lib/libm-2.28.so
unwind: access_mem addr 0x7ffc80811ce0 val 56382b0fc129, offset 104
unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/
test-clients/cpp-inlining/cpp-inlining
unwind: access_mem addr 0x7ffc80811d30 val 7faf7dabf223, offset 184
unwind: find_proc_info dso /usr/lib/libc-2.28.so
unwind: access_mem addr 0x7ffc80811df0 val 56382b0fc1ee, offset 376
unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/
test-clients/cpp-inlining/cpp-inlining
unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
unwind: main:ip = 0x56382b0fc128 (0x1128)
unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: 
7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
build/tests/test-clients/cpp-inlining/cpp-inlining)
7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
build/tests/test-clients/cpp-inlining/cpp-inlining)

```

So, what now? Here are my open questions:

Is this just working now by chance, or is this the real reason? I.e. is the 
register value for the stack pointer incorrectly recorded?

Can this be fixed somehow during record time?

Can we detect this scenario at analysis time and correct the stack pointer 
address automatically somehow? Should the first frame always try to read from 
offset 72 maybe?

Any help would be greatly appreciated, many thanks

-- 
Milian Wolff
m...@milianw.de
http://milianw.de

signature.asc
Description: This is a digitally signed message part.


Broken dwarf unwinding - wrong stack pointer register value?

2018-10-20 Thread Milian Wolff
eaningful value...

This offset is calculcated from LIBUNWIND__ARCH_REG_SP, cf. unwind-libunwind-
local.c. So is the stack pointer address in the register wrong? If I hackishly 
offset the value for the stack pointer by 40, i.e.:

```
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/
unwind-libunwind-local.c
index 79f521a552cf..a766ddaaa4dd 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -502,6 +502,7 @@ static int access_mem(unw_addr_space_t __maybe_unused as,
if (ret)
return ret;
 
+   start -= 40;
end = start + stack->size;
 
/* Check overflow. */
```

Then I can successfully unwind the broken sample:

```
$ perf script -v --no-inline --time 90229.127156,90229.127158 --ns
...
unwind: reg 16, val 7faf7dca2688
unwind: reg 7, val 7ffc80811ca0
unwind: find_proc_info dso /usr/lib/libm-2.28.so
unwind: access_mem addr 0x7ffc80811cc0 val 7faf7dc88af9, offset 72
unwind: find_proc_info dso /usr/lib/libm-2.28.so
unwind: access_mem addr 0x7ffc80811ce0 val 56382b0fc129, offset 104
unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/
test-clients/cpp-inlining/cpp-inlining
unwind: access_mem addr 0x7ffc80811d30 val 7faf7dabf223, offset 184
unwind: find_proc_info dso /usr/lib/libc-2.28.so
unwind: access_mem addr 0x7ffc80811df0 val 56382b0fc1ee, offset 376
unwind: find_proc_info dso /home/milian/projects/kdab/rnd/hotspot/build/tests/
test-clients/cpp-inlining/cpp-inlining
unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
unwind: main:ip = 0x56382b0fc128 (0x1128)
unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
cpp-inlining 24617 90229.127157818: 719976 cycles:uppp: 
7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
build/tests/test-clients/cpp-inlining/cpp-inlining)
7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
build/tests/test-clients/cpp-inlining/cpp-inlining)

```

So, what now? Here are my open questions:

Is this just working now by chance, or is this the real reason? I.e. is the 
register value for the stack pointer incorrectly recorded?

Can this be fixed somehow during record time?

Can we detect this scenario at analysis time and correct the stack pointer 
address automatically somehow? Should the first frame always try to read from 
offset 72 maybe?

Any help would be greatly appreciated, many thanks

-- 
Milian Wolff
m...@milianw.de
http://milianw.de

signature.asc
Description: This is a digitally signed message part.


[tip:perf/urgent] perf report: Don't crash on invalid inline debug information

2018-10-18 Thread tip-bot for Milian Wolff
Commit-ID:  d4046e8e17b9f378cb861982ef71c63911b5dff3
Gitweb: https://git.kernel.org/tip/d4046e8e17b9f378cb861982ef71c63911b5dff3
Author: Milian Wolff 
AuthorDate: Wed, 26 Sep 2018 15:52:07 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 16 Oct 2018 14:52:21 -0300

perf report: Don't crash on invalid inline debug information

When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:

  #0  0x55895c01 in bfd_demangle ()
  #1  0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, 
kmodule=0) at util/symbol-elf.c:215
  #2  dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, 
kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
  #3  0x557fef4b in new_inline_sym (funcname=0x0, 
base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89
  #4  inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, 
node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at 
util/srcline.c:264
  #5  0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", 
addr=addr@entry=2888, file=file@entry=0x0,
  line=line@entry=0x0, dso=dso@entry=0x55c7bb00, 
unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, 
sym=0x55d92b90) at util/srcline.c:313
  #6  0x557ffe7c in addr2inlines (sym=0x55d92b90, 
dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
  at util/srcline.c:358

So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:

  $ addr2line -e 
/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if 
b48
  main
  /usr/include/c++/8.2.1/complex:610
  ??
  /usr/include/c++/8.2.1/complex:618
  ??
  /usr/include/c++/8.2.1/complex:675
  ??
  /usr/include/c++/8.2.1/complex:685
  main
  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there which
should fix this issue:

https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Reported-by: Hadrien Grasland 
Signed-off-by: Milian Wolff 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Fixes: a64489c56c30 ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
  introduced, i.e.  using a2l->funcname without checking if it is NULL,
  but this current patch fixes the current codebase, i.e. multiple csets
  were applied after a64489c56c30 before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/srcline.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 09d6746e6ec8..e767c4a9d4d2 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso,
struct symbol *inline_sym;
char *demangled = NULL;
 
+   if (!funcname)
+   funcname = "??";
+
if (dso) {
demangled = dso__demangle_sym(dso, 0, funcname);
if (demangled)


[tip:perf/urgent] perf report: Don't crash on invalid inline debug information

2018-10-18 Thread tip-bot for Milian Wolff
Commit-ID:  d4046e8e17b9f378cb861982ef71c63911b5dff3
Gitweb: https://git.kernel.org/tip/d4046e8e17b9f378cb861982ef71c63911b5dff3
Author: Milian Wolff 
AuthorDate: Wed, 26 Sep 2018 15:52:07 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 16 Oct 2018 14:52:21 -0300

perf report: Don't crash on invalid inline debug information

When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:

  #0  0x55895c01 in bfd_demangle ()
  #1  0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, 
kmodule=0) at util/symbol-elf.c:215
  #2  dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, 
kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
  #3  0x557fef4b in new_inline_sym (funcname=0x0, 
base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89
  #4  inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, 
node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at 
util/srcline.c:264
  #5  0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", 
addr=addr@entry=2888, file=file@entry=0x0,
  line=line@entry=0x0, dso=dso@entry=0x55c7bb00, 
unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, 
sym=0x55d92b90) at util/srcline.c:313
  #6  0x557ffe7c in addr2inlines (sym=0x55d92b90, 
dso=0x55c7bb00, addr=2888, dso_name=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
  at util/srcline.c:358

So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:

  $ addr2line -e 
/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if 
b48
  main
  /usr/include/c++/8.2.1/complex:610
  ??
  /usr/include/c++/8.2.1/complex:618
  ??
  /usr/include/c++/8.2.1/complex:675
  ??
  /usr/include/c++/8.2.1/complex:685
  main
  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there which
should fix this issue:

https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Reported-by: Hadrien Grasland 
Signed-off-by: Milian Wolff 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Fixes: a64489c56c30 ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
  introduced, i.e.  using a2l->funcname without checking if it is NULL,
  but this current patch fixes the current codebase, i.e. multiple csets
  were applied after a64489c56c30 before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/srcline.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 09d6746e6ec8..e767c4a9d4d2 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso,
struct symbol *inline_sym;
char *demangled = NULL;
 
+   if (!funcname)
+   funcname = "??";
+
if (dso) {
demangled = dso__demangle_sym(dso, 0, funcname);
if (demangled)


Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-10-16 Thread Milian Wolff
On Dienstag, 16. Oktober 2018 19:52:04 CEST Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 16, 2018 at 02:49:23PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Oct 15, 2018 at 10:51:36PM +0200, Milian Wolff escreveu:
> > > On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo 
wrote:
> > > > Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu:
> > > > > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de
> > > > > Melo
> > > > > 
> > > > > wrote:
> > > > > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu:
> > > > > > > When the function name for an inline frame is invalid, we must
> > > > > > > not try to demangle this symbol, otherwise we crash with:
> > > > > > > 
> > > > > > > #0  0x55895c01 in bfd_demangle ()
> > > > > > > #1  0x55823262 in demangle_sym (dso=0x55d92b90,
> > > > > > > elf_name=0x0,
> > > > > > > kmodule=0) at util/symbol-elf.c:215 #2  dso__demangle_sym
> > > > > > > (dso=dso@entry=0x55d92b90, kmodule=,
> > > > > > > kmodule@entry=0,
> > > > > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3
> > > > > > > 0x557fef4b in new_inline_sym (funcname=0x0,
> > > > > > > base_sym=0x55d92b90, dso=0x55d92b90) at
> > > > > > > util/srcline.c:89 #4
> > > > > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00,
> > > > > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at
> > > > > > > util/srcline.c:264 #5  0x557ff27f in addr2line
> > > > > > > (dso_name=dso_name@entry=0x55d92430
> > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603
> > > > > > > 3d24fc
> > > > > > > e5/
> > > > > > > elf", addr=addr@entry=2888, file=file@entry=0x0,>
> > > > > > > 
> > > > > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00,
> > > > > > > unwind_inlines=unwind_inlines@entry=true,
> > > > > > > node=0x55e31810,
> > > > > > > sym=0x55d92b90) at util/srcline.c:313>
> > > > > > > 
> > > > > > > #6  0x557ffe7c in addr2inlines (sym=0x55d92b90,
> > > > > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430
> > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603
> > > > > > > 3d24fc
> > > > > > > e5/
> > > > > > > elf")>
> > > > > > > 
> > > > > > > at util/srcline.c:358
> > > > > > > 
> > > > > > > So instead handle the case where we get invalid function names
> > > > > > > for inlined frames and use a fallback '??' function name
> > > > > > > instead.
> > > > > > > 
> > > > > > > While this crash was originally reported by Hadrien for rust
> > > > > > > code,
> > > > > > > I can now also reproduce it with trivial C++ code. Indeed, it
> > > > > > > seems
> > > > > > > like libbfd fails to interpret the debug information for the
> > > > > > > inline
> > > > > > > frame symbol name:
> > > > > > > 
> > > > > > > $ addr2line -e
> > > > > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033
> > > > > > > d24fce
> > > > > > > 5/e
> > > > > > > lf -if b48 main
> > > > > > > /usr/include/c++/8.2.1/complex:610
> > > > > > > ??
> > > > > > > /usr/include/c++/8.2.1/complex:618
> > > > > > > ??
> > > > > > > /usr/include/c++/8.2.1/complex:675
> > > > > > > ??
> > > > > > > /usr/include/c++/8.2.1/complex:685
> > > > > > > main
> > > > > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-in
> > > > > > > lining
> > > > > > > /mai
> > > > > > > n.cpp:39
> > > > &g

Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-10-16 Thread Milian Wolff
On Dienstag, 16. Oktober 2018 19:52:04 CEST Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 16, 2018 at 02:49:23PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Oct 15, 2018 at 10:51:36PM +0200, Milian Wolff escreveu:
> > > On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo 
wrote:
> > > > Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu:
> > > > > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de
> > > > > Melo
> > > > > 
> > > > > wrote:
> > > > > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu:
> > > > > > > When the function name for an inline frame is invalid, we must
> > > > > > > not try to demangle this symbol, otherwise we crash with:
> > > > > > > 
> > > > > > > #0  0x55895c01 in bfd_demangle ()
> > > > > > > #1  0x55823262 in demangle_sym (dso=0x55d92b90,
> > > > > > > elf_name=0x0,
> > > > > > > kmodule=0) at util/symbol-elf.c:215 #2  dso__demangle_sym
> > > > > > > (dso=dso@entry=0x55d92b90, kmodule=,
> > > > > > > kmodule@entry=0,
> > > > > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3
> > > > > > > 0x557fef4b in new_inline_sym (funcname=0x0,
> > > > > > > base_sym=0x55d92b90, dso=0x55d92b90) at
> > > > > > > util/srcline.c:89 #4
> > > > > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00,
> > > > > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at
> > > > > > > util/srcline.c:264 #5  0x557ff27f in addr2line
> > > > > > > (dso_name=dso_name@entry=0x55d92430
> > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603
> > > > > > > 3d24fc
> > > > > > > e5/
> > > > > > > elf", addr=addr@entry=2888, file=file@entry=0x0,>
> > > > > > > 
> > > > > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00,
> > > > > > > unwind_inlines=unwind_inlines@entry=true,
> > > > > > > node=0x55e31810,
> > > > > > > sym=0x55d92b90) at util/srcline.c:313>
> > > > > > > 
> > > > > > > #6  0x557ffe7c in addr2inlines (sym=0x55d92b90,
> > > > > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430
> > > > > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da6603
> > > > > > > 3d24fc
> > > > > > > e5/
> > > > > > > elf")>
> > > > > > > 
> > > > > > > at util/srcline.c:358
> > > > > > > 
> > > > > > > So instead handle the case where we get invalid function names
> > > > > > > for inlined frames and use a fallback '??' function name
> > > > > > > instead.
> > > > > > > 
> > > > > > > While this crash was originally reported by Hadrien for rust
> > > > > > > code,
> > > > > > > I can now also reproduce it with trivial C++ code. Indeed, it
> > > > > > > seems
> > > > > > > like libbfd fails to interpret the debug information for the
> > > > > > > inline
> > > > > > > frame symbol name:
> > > > > > > 
> > > > > > > $ addr2line -e
> > > > > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033
> > > > > > > d24fce
> > > > > > > 5/e
> > > > > > > lf -if b48 main
> > > > > > > /usr/include/c++/8.2.1/complex:610
> > > > > > > ??
> > > > > > > /usr/include/c++/8.2.1/complex:618
> > > > > > > ??
> > > > > > > /usr/include/c++/8.2.1/complex:675
> > > > > > > ??
> > > > > > > /usr/include/c++/8.2.1/complex:685
> > > > > > > main
> > > > > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-in
> > > > > > > lining
> > > > > > > /mai
> > > > > > > n.cpp:39
> > > > &g

Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-10-15 Thread Milian Wolff
On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu:
> > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo
> > 
> > wrote:
> > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu:
> > > > When the function name for an inline frame is invalid, we must
> > > > not try to demangle this symbol, otherwise we crash with:
> > > > 
> > > > #0  0x55895c01 in bfd_demangle ()
> > > > #1  0x55823262 in demangle_sym (dso=0x55d92b90,
> > > > elf_name=0x0,
> > > > kmodule=0) at util/symbol-elf.c:215 #2  dso__demangle_sym
> > > > (dso=dso@entry=0x55d92b90, kmodule=,
> > > > kmodule@entry=0,
> > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3
> > > > 0x557fef4b in new_inline_sym (funcname=0x0,
> > > > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4
> > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00,
> > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at
> > > > util/srcline.c:264 #5  0x557ff27f in addr2line
> > > > (dso_name=dso_name@entry=0x55d92430
> > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc
> > > > e5/
> > > > elf", addr=addr@entry=2888, file=file@entry=0x0,>
> > > > 
> > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00,
> > > > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810,
> > > > sym=0x55d92b90) at util/srcline.c:313>
> > > > 
> > > > #6  0x557ffe7c in addr2inlines (sym=0x55d92b90,
> > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430
> > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc
> > > > e5/
> > > > elf")>
> > > > 
> > > > at util/srcline.c:358
> > > > 
> > > > So instead handle the case where we get invalid function names
> > > > for inlined frames and use a fallback '??' function name instead.
> > > > 
> > > > While this crash was originally reported by Hadrien for rust code,
> > > > I can now also reproduce it with trivial C++ code. Indeed, it seems
> > > > like libbfd fails to interpret the debug information for the inline
> > > > frame symbol name:
> > > > 
> > > > $ addr2line -e
> > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce
> > > > 5/e
> > > > lf -if b48 main
> > > > /usr/include/c++/8.2.1/complex:610
> > > > ??
> > > > /usr/include/c++/8.2.1/complex:618
> > > > ??
> > > > /usr/include/c++/8.2.1/complex:675
> > > > ??
> > > > /usr/include/c++/8.2.1/complex:685
> > > > main
> > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining
> > > > /mai
> > > > n.cpp:39
> > > > 
> > > > I've reported this bug upstream and also attached a patch there
> > > > which should fix this issue:
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=23715
> > > 
> > > Millian, what about this one, which is the cset it is fixing?
> > 
> > Hey Arnaldo,
> > 
> > just noticed this email and that the corresponding patch hasn't landed in
> > perf/core yet. The patch set which introduced this is a64489c56c307 ("perf
> > report: Find the inline stack for a given address"). Note that the code
> > was
> > introduced by this patch, but then subsequently touched and moved by
> > follow up patches. So, is this the patch you want to see referenced?
> > Otherwise, the latest patch which gets fixed is afaik: 7285cf3325b4a
> > ("perf srcline: Show correct function name for srcline of callchains").
> > 
> > Can you please pick either of these patches and amend the commit message
> > of my patch and push it to perf/urgent and perf/core?
> 
> I'll reread all this later or tomorrow and continue, going AFK now.

Ping?

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-10-15 Thread Milian Wolff
On Donnerstag, 11. Oktober 2018 21:39:20 CEST Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 11, 2018 at 08:23:31PM +0200, Milian Wolff escreveu:
> > On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo
> > 
> > wrote:
> > > Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu:
> > > > When the function name for an inline frame is invalid, we must
> > > > not try to demangle this symbol, otherwise we crash with:
> > > > 
> > > > #0  0x55895c01 in bfd_demangle ()
> > > > #1  0x55823262 in demangle_sym (dso=0x55d92b90,
> > > > elf_name=0x0,
> > > > kmodule=0) at util/symbol-elf.c:215 #2  dso__demangle_sym
> > > > (dso=dso@entry=0x55d92b90, kmodule=,
> > > > kmodule@entry=0,
> > > > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3
> > > > 0x557fef4b in new_inline_sym (funcname=0x0,
> > > > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4
> > > > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00,
> > > > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at
> > > > util/srcline.c:264 #5  0x557ff27f in addr2line
> > > > (dso_name=dso_name@entry=0x55d92430
> > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc
> > > > e5/
> > > > elf", addr=addr@entry=2888, file=file@entry=0x0,>
> > > > 
> > > > line=line@entry=0x0, dso=dso@entry=0x55c7bb00,
> > > > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810,
> > > > sym=0x55d92b90) at util/srcline.c:313>
> > > > 
> > > > #6  0x557ffe7c in addr2inlines (sym=0x55d92b90,
> > > > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430
> > > > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fc
> > > > e5/
> > > > elf")>
> > > > 
> > > > at util/srcline.c:358
> > > > 
> > > > So instead handle the case where we get invalid function names
> > > > for inlined frames and use a fallback '??' function name instead.
> > > > 
> > > > While this crash was originally reported by Hadrien for rust code,
> > > > I can now also reproduce it with trivial C++ code. Indeed, it seems
> > > > like libbfd fails to interpret the debug information for the inline
> > > > frame symbol name:
> > > > 
> > > > $ addr2line -e
> > > > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce
> > > > 5/e
> > > > lf -if b48 main
> > > > /usr/include/c++/8.2.1/complex:610
> > > > ??
> > > > /usr/include/c++/8.2.1/complex:618
> > > > ??
> > > > /usr/include/c++/8.2.1/complex:675
> > > > ??
> > > > /usr/include/c++/8.2.1/complex:685
> > > > main
> > > > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining
> > > > /mai
> > > > n.cpp:39
> > > > 
> > > > I've reported this bug upstream and also attached a patch there
> > > > which should fix this issue:
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=23715
> > > 
> > > Millian, what about this one, which is the cset it is fixing?
> > 
> > Hey Arnaldo,
> > 
> > just noticed this email and that the corresponding patch hasn't landed in
> > perf/core yet. The patch set which introduced this is a64489c56c307 ("perf
> > report: Find the inline stack for a given address"). Note that the code
> > was
> > introduced by this patch, but then subsequently touched and moved by
> > follow up patches. So, is this the patch you want to see referenced?
> > Otherwise, the latest patch which gets fixed is afaik: 7285cf3325b4a
> > ("perf srcline: Show correct function name for srcline of callchains").
> > 
> > Can you please pick either of these patches and amend the commit message
> > of my patch and push it to perf/urgent and perf/core?
> 
> I'll reread all this later or tomorrow and continue, going AFK now.

Ping?

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-10-11 Thread Milian Wolff
On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo 
wrote:
> Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu:
> > When the function name for an inline frame is invalid, we must
> > not try to demangle this symbol, otherwise we crash with:
> > 
> > #0  0x55895c01 in bfd_demangle ()
> > #1  0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0,
> > kmodule=0) at util/symbol-elf.c:215 #2  dso__demangle_sym
> > (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0,
> > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 
> > 0x557fef4b in new_inline_sym (funcname=0x0,
> > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 
> > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00,
> > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at
> > util/srcline.c:264 #5  0x557ff27f in addr2line
> > (dso_name=dso_name@entry=0x55d92430
> > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/
> > elf", addr=addr@entry=2888, file=file@entry=0x0,> 
> > line=line@entry=0x0, dso=dso@entry=0x55c7bb00,
> > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810,
> > sym=0x55d92b90) at util/srcline.c:313> 
> > #6  0x557ffe7c in addr2inlines (sym=0x55d92b90,
> > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430
> > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/
> > elf")> 
> > at util/srcline.c:358
> > 
> > So instead handle the case where we get invalid function names
> > for inlined frames and use a fallback '??' function name instead.
> > 
> > While this crash was originally reported by Hadrien for rust code,
> > I can now also reproduce it with trivial C++ code. Indeed, it seems
> > like libbfd fails to interpret the debug information for the inline
> > frame symbol name:
> > 
> > $ addr2line -e
> > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/e
> > lf -if b48 main
> > /usr/include/c++/8.2.1/complex:610
> > ??
> > /usr/include/c++/8.2.1/complex:618
> > ??
> > /usr/include/c++/8.2.1/complex:675
> > ??
> > /usr/include/c++/8.2.1/complex:685
> > main
> > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/mai
> > n.cpp:39
> > 
> > I've reported this bug upstream and also attached a patch there
> > which should fix this issue:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=23715
> 
> Millian, what about this one, which is the cset it is fixing?

Hey Arnaldo,

just noticed this email and that the corresponding patch hasn't landed in 
perf/core yet. The patch set which introduced this is a64489c56c307 ("perf 
report: Find the inline stack for a given address"). Note that the code was 
introduced by this patch, but then subsequently touched and moved by follow up 
patches. So, is this the patch you want to see referenced? Otherwise, the 
latest patch which gets fixed is afaik: 7285cf3325b4a ("perf srcline: Show 
correct function name for srcline of callchains").

Can you please pick either of these patches and amend the commit message of my 
patch and push it to perf/urgent and perf/core?

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-10-11 Thread Milian Wolff
On Donnerstag, 27. September 2018 21:10:37 CEST Arnaldo Carvalho de Melo 
wrote:
> Em Wed, Sep 26, 2018 at 03:52:07PM +0200, Milian Wolff escreveu:
> > When the function name for an inline frame is invalid, we must
> > not try to demangle this symbol, otherwise we crash with:
> > 
> > #0  0x55895c01 in bfd_demangle ()
> > #1  0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0,
> > kmodule=0) at util/symbol-elf.c:215 #2  dso__demangle_sym
> > (dso=dso@entry=0x55d92b90, kmodule=, kmodule@entry=0,
> > elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 #3 
> > 0x557fef4b in new_inline_sym (funcname=0x0,
> > base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89 #4 
> > inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00,
> > node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at
> > util/srcline.c:264 #5  0x557ff27f in addr2line
> > (dso_name=dso_name@entry=0x55d92430
> > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/
> > elf", addr=addr@entry=2888, file=file@entry=0x0,> 
> > line=line@entry=0x0, dso=dso@entry=0x55c7bb00,
> > unwind_inlines=unwind_inlines@entry=true, node=0x55e31810,
> > sym=0x55d92b90) at util/srcline.c:313> 
> > #6  0x557ffe7c in addr2inlines (sym=0x55d92b90,
> > dso=0x55c7bb00, addr=2888, dso_name=0x55d92430
> > "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/
> > elf")> 
> > at util/srcline.c:358
> > 
> > So instead handle the case where we get invalid function names
> > for inlined frames and use a fallback '??' function name instead.
> > 
> > While this crash was originally reported by Hadrien for rust code,
> > I can now also reproduce it with trivial C++ code. Indeed, it seems
> > like libbfd fails to interpret the debug information for the inline
> > frame symbol name:
> > 
> > $ addr2line -e
> > /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/e
> > lf -if b48 main
> > /usr/include/c++/8.2.1/complex:610
> > ??
> > /usr/include/c++/8.2.1/complex:618
> > ??
> > /usr/include/c++/8.2.1/complex:675
> > ??
> > /usr/include/c++/8.2.1/complex:685
> > main
> > /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/mai
> > n.cpp:39
> > 
> > I've reported this bug upstream and also attached a patch there
> > which should fix this issue:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=23715
> 
> Millian, what about this one, which is the cset it is fixing?

Hey Arnaldo,

just noticed this email and that the corresponding patch hasn't landed in 
perf/core yet. The patch set which introduced this is a64489c56c307 ("perf 
report: Find the inline stack for a given address"). Note that the code was 
introduced by this patch, but then subsequently touched and moved by follow up 
patches. So, is this the patch you want to see referenced? Otherwise, the 
latest patch which gets fixed is afaik: 7285cf3325b4a ("perf srcline: Show 
correct function name for srcline of callchains").

Can you please pick either of these patches and amend the commit message of my 
patch and push it to perf/urgent and perf/core?

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf record: use unmapped IP for inline callchain cursors

2018-10-08 Thread Milian Wolff
On Freitag, 5. Oktober 2018 15:48:31 CEST Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 03, 2018 at 09:05:37AM +0530, Ravi Bangoria escreveu:
> > LGTM.
> > 
> > Tested-by: Ravi Bangoria 
> 
> So, I've added this as a 'git rebase -i' 'fixup', i.e. kept the commit
> log message for the patch this patch fixes, and combined the two into
> just one patch so that we don't pollute the bisect history, since this
> hasn't made it yet to tip, and I also added Ravi's Tested-by, since this
> tests both.

Thanks a lot for the cleanup work Arnaldo.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf record: use unmapped IP for inline callchain cursors

2018-10-08 Thread Milian Wolff
On Freitag, 5. Oktober 2018 15:48:31 CEST Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 03, 2018 at 09:05:37AM +0530, Ravi Bangoria escreveu:
> > LGTM.
> > 
> > Tested-by: Ravi Bangoria 
> 
> So, I've added this as a 'git rebase -i' 'fixup', i.e. kept the commit
> log message for the patch this patch fixes, and combined the two into
> just one patch so that we don't pollute the bisect history, since this
> hasn't made it yet to tip, and I also added Ravi's Tested-by, since this
> tests both.

Thanks a lot for the cleanup work Arnaldo.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[tip:perf/urgent] perf record: Use unmapped IP for inline callchain cursors

2018-10-05 Thread tip-bot for Milian Wolff
Commit-ID:  7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf
Gitweb: https://git.kernel.org/tip/7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf
Author: Milian Wolff 
AuthorDate: Wed, 26 Sep 2018 15:52:06 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 5 Oct 2018 11:18:09 -0300

perf record: Use unmapped IP for inline callchain cursors

Only use the mapped IP to find inline frames, but keep using the
unmapped IP for the callchain cursor. This ensures we properly show the
unmapped IP when displaying a frame we received via the
dso__parse_addr_inlines API for a module which does not contain
sufficient debug symbols to show the srcline.

This is another follow-up to commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Tested-by: Ravi Bangoria 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Jin Yao 
Cc: Namhyung Kim 
Cc: Sandipan Das 
Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets")
Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wo...@kdab.com
Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wo...@kdab.com
[ Squashed a fix from Milian for a problem reported by Ravi, fixed up space 
damage ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0cb4f8bf3ca7..111ae858cbcb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor,
if (!symbol_conf.inline_name || !map || !sym)
return ret;
 
-   addr = map__rip_2objdump(map, ip);
+   addr = map__map_ip(map, ip);
+   addr = map__rip_2objdump(map, addr);
 
inline_node = inlines__tree_find(>dso->inlined_nodes, addr);
if (!inline_node) {


[tip:perf/urgent] perf record: Use unmapped IP for inline callchain cursors

2018-10-05 Thread tip-bot for Milian Wolff
Commit-ID:  7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf
Gitweb: https://git.kernel.org/tip/7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf
Author: Milian Wolff 
AuthorDate: Wed, 26 Sep 2018 15:52:06 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 5 Oct 2018 11:18:09 -0300

perf record: Use unmapped IP for inline callchain cursors

Only use the mapped IP to find inline frames, but keep using the
unmapped IP for the callchain cursor. This ensures we properly show the
unmapped IP when displaying a frame we received via the
dso__parse_addr_inlines API for a module which does not contain
sufficient debug symbols to show the srcline.

This is another follow-up to commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

Signed-off-by: Milian Wolff 
Acked-by: Jiri Olsa 
Tested-by: Ravi Bangoria 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Jin Yao 
Cc: Namhyung Kim 
Cc: Sandipan Das 
Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets")
Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wo...@kdab.com
Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wo...@kdab.com
[ Squashed a fix from Milian for a problem reported by Ravi, fixed up space 
damage ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0cb4f8bf3ca7..111ae858cbcb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor,
if (!symbol_conf.inline_name || !map || !sym)
return ret;
 
-   addr = map__rip_2objdump(map, ip);
+   addr = map__map_ip(map, ip);
+   addr = map__rip_2objdump(map, addr);
 
inline_node = inlines__tree_find(>dso->inlined_nodes, addr);
if (!inline_node) {


[tip:perf/urgent] perf report: Don't try to map ip to invalid map

2018-10-05 Thread tip-bot for Milian Wolff
Commit-ID:  ff4ce2885af8f9e8e99864d78dbeb4673f089c76
Gitweb: https://git.kernel.org/tip/ff4ce2885af8f9e8e99864d78dbeb4673f089c76
Author: Milian Wolff 
AuthorDate: Wed, 26 Sep 2018 15:52:05 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 27 Sep 2018 16:05:43 -0300

perf report: Don't try to map ip to invalid map

Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:

  #0  0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329
  #1  unwind_entry (entry=entry@entry=0x7fff9180, 
arg=arg@entry=0x75642498) at util/machine.c:2329
  #2  0x558370af in entry (arg=0x75642498, cb=0x557bdb50 
, thread=, ip=18446744073709551615) at 
util/unwind-libunwind-local.c:586
  #3  get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 
, arg=0x75642498, max_stack=) at 
util/unwind-libunwind-local.c:703
  #4  0x55837192 in _unwind__get_entries (cb=, 
arg=, thread=, data=, 
max_stack=) at util/unwind-libunwind-local.c:725
  #5  0x557c310f in thread__resolve_callchain_unwind (max_stack=127, 
sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, 
thread=0x55c7f6f0) at util/machine.c:2351
  #6  thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, 
evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, 
root_al=0x7fff9750, max_stack=127) at util/machine.c:2378
  #7  0x557ba4ee in sample__resolve_callchain (sample=, 
cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750,
  max_stack=) at util/callchain.c:1085

Signed-off-by: Milian Wolff 
Tested-by: Sandipan Das 
Acked-by: Jiri Olsa 
Cc: Jin Yao 
Cc: Namhyung Kim 
Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based 
unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c4acd2001db0..0cb4f8bf3ca7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 {
struct callchain_cursor *cursor = arg;
const char *srcline = NULL;
-   u64 addr;
+   u64 addr = entry->ip;
 
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
@@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
 */
-   addr = map__map_ip(entry->map, entry->ip);
+   if (entry->map)
+   addr = map__map_ip(entry->map, entry->ip);
 
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,


[tip:perf/urgent] perf report: Don't try to map ip to invalid map

2018-10-05 Thread tip-bot for Milian Wolff
Commit-ID:  ff4ce2885af8f9e8e99864d78dbeb4673f089c76
Gitweb: https://git.kernel.org/tip/ff4ce2885af8f9e8e99864d78dbeb4673f089c76
Author: Milian Wolff 
AuthorDate: Wed, 26 Sep 2018 15:52:05 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 27 Sep 2018 16:05:43 -0300

perf report: Don't try to map ip to invalid map

Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:

  #0  0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329
  #1  unwind_entry (entry=entry@entry=0x7fff9180, 
arg=arg@entry=0x75642498) at util/machine.c:2329
  #2  0x558370af in entry (arg=0x75642498, cb=0x557bdb50 
, thread=, ip=18446744073709551615) at 
util/unwind-libunwind-local.c:586
  #3  get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 
, arg=0x75642498, max_stack=) at 
util/unwind-libunwind-local.c:703
  #4  0x55837192 in _unwind__get_entries (cb=, 
arg=, thread=, data=, 
max_stack=) at util/unwind-libunwind-local.c:725
  #5  0x557c310f in thread__resolve_callchain_unwind (max_stack=127, 
sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, 
thread=0x55c7f6f0) at util/machine.c:2351
  #6  thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, 
evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, 
root_al=0x7fff9750, max_stack=127) at util/machine.c:2378
  #7  0x557ba4ee in sample__resolve_callchain (sample=, 
cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750,
  max_stack=) at util/callchain.c:1085

Signed-off-by: Milian Wolff 
Tested-by: Sandipan Das 
Acked-by: Jiri Olsa 
Cc: Jin Yao 
Cc: Namhyung Kim 
Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based 
unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c4acd2001db0..0cb4f8bf3ca7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 {
struct callchain_cursor *cursor = arg;
const char *srcline = NULL;
-   u64 addr;
+   u64 addr = entry->ip;
 
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
@@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
 */
-   addr = map__map_ip(entry->map, entry->ip);
+   if (entry->map)
+   addr = map__map_ip(entry->map, entry->ip);
 
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,


[PATCH] perf record: use unmapped IP for inline callchain cursors

2018-10-02 Thread Milian Wolff
Only use the mapped IP to find inline frames, but keep
using the unmapped IP for the callchain cursor. This
ensures we properly show the unmapped IP when displaying
a frame we received via the dso__parse_addr_inlines API
for a module which does not contain sufficient debug symbols
to show the srcline.

Before:
$ perf record -e cycles:u --call-graph ls
$ perf script
...
ls 12853  2735.563911:  43354 cycles:u:
   17878 __GI___tunables_init+0x01d1d63a0118 
(/usr/lib/ld-2.28.so)
   19ee9 _dl_sysdep_start+0x01d1d63a02e9 
(/usr/lib/ld-2.28.so)
3087 _dl_start+0x01d1d63a0287 (/usr/lib/ld-2.28.so)
2007 _start+0x01d1d63a0007 (/usr/lib/ld-2.28.so)

After:

$ perf script
...
ls 12853  2735.563911:  43354 cycles:u:
7f1714e46878 __GI___tunables_init+0x118 (/usr/lib/ld-2.28.so)
7f1714e48ee9 _dl_sysdep_start+0x2e9 (/usr/lib/ld-2.28.so)
7f1714e32087 _dl_start+0x287 (/usr/lib/ld-2.28.so)
7f1714e31007 _start+0x7 (/usr/lib/ld-2.28.so)

For frames with sufficient debug symbols, the behavior is
still sane and works as expected in my tests.

This patch series shows that we desperately need
an automated test for inline frame resolution. I'll try to
come up with something for the various regressions in the future.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Ravi Bangoria 
# Tested-by:
# Reviewed-by:
# Suggested-b:
Fixes: bfe16b0653 ("perf report: Don't crash on invalid inline debug 
information")
---
 tools/perf/util/machine.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 73a651f10a0f..111ae858cbcb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor,
if (!symbol_conf.inline_name || !map || !sym)
return ret;
 
-   addr = map__rip_2objdump(map, ip);
+   addr = map__map_ip(map, ip);
+   addr = map__rip_2objdump(map, addr);
 
inline_node = inlines__tree_find(>dso->inlined_nodes, addr);
if (!inline_node) {
@@ -2317,6 +2318,9 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
 
+   if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
+   return 0;
+
/*
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
@@ -2324,9 +2328,6 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (entry->map)
addr = map__map_ip(entry->map, entry->ip);
 
-   if (append_inlines(cursor, entry->map, entry->sym, addr) == 0)
-   return 0;
-
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
   entry->map, entry->sym,
-- 
2.19.0


[PATCH] perf record: use unmapped IP for inline callchain cursors

2018-10-02 Thread Milian Wolff
Only use the mapped IP to find inline frames, but keep
using the unmapped IP for the callchain cursor. This
ensures we properly show the unmapped IP when displaying
a frame we received via the dso__parse_addr_inlines API
for a module which does not contain sufficient debug symbols
to show the srcline.

Before:
$ perf record -e cycles:u --call-graph ls
$ perf script
...
ls 12853  2735.563911:  43354 cycles:u:
   17878 __GI___tunables_init+0x01d1d63a0118 
(/usr/lib/ld-2.28.so)
   19ee9 _dl_sysdep_start+0x01d1d63a02e9 
(/usr/lib/ld-2.28.so)
3087 _dl_start+0x01d1d63a0287 (/usr/lib/ld-2.28.so)
2007 _start+0x01d1d63a0007 (/usr/lib/ld-2.28.so)

After:

$ perf script
...
ls 12853  2735.563911:  43354 cycles:u:
7f1714e46878 __GI___tunables_init+0x118 (/usr/lib/ld-2.28.so)
7f1714e48ee9 _dl_sysdep_start+0x2e9 (/usr/lib/ld-2.28.so)
7f1714e32087 _dl_start+0x287 (/usr/lib/ld-2.28.so)
7f1714e31007 _start+0x7 (/usr/lib/ld-2.28.so)

For frames with sufficient debug symbols, the behavior is
still sane and works as expected in my tests.

This patch series shows that we desperately need
an automated test for inline frame resolution. I'll try to
come up with something for the various regressions in the future.

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Ravi Bangoria 
# Tested-by:
# Reviewed-by:
# Suggested-b:
Fixes: bfe16b0653 ("perf report: Don't crash on invalid inline debug 
information")
---
 tools/perf/util/machine.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 73a651f10a0f..111ae858cbcb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2286,7 +2286,8 @@ static int append_inlines(struct callchain_cursor *cursor,
if (!symbol_conf.inline_name || !map || !sym)
return ret;
 
-   addr = map__rip_2objdump(map, ip);
+   addr = map__map_ip(map, ip);
+   addr = map__rip_2objdump(map, addr);
 
inline_node = inlines__tree_find(>dso->inlined_nodes, addr);
if (!inline_node) {
@@ -2317,6 +2318,9 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
 
+   if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
+   return 0;
+
/*
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
@@ -2324,9 +2328,6 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (entry->map)
addr = map__map_ip(entry->map, entry->ip);
 
-   if (append_inlines(cursor, entry->map, entry->sym, addr) == 0)
-   return 0;
-
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
   entry->map, entry->sym,
-- 
2.19.0


Re: [RFC 00/10] perf: Add cputime events/metrics

2018-09-26 Thread Milian Wolff
On Thursday, June 7, 2018 1:10:18 AM CEST Andi Kleen wrote:
> > I had some issues with IDLE counter being miscounted due to stopping
> > of the idle tick. I tried to solve it in this patch (it's part of the
> > 
> > patchset):
> >   perf/cputime: Don't stop idle tick if there's live cputime event
> > 
> > but I'm pretty sure it's wrong and there's better solution.
> 
> At least on intel we already have hardware counters for different idle
> states. You just would need to add them and convert to the same
> unit.
> 
> But of course it's still useful when this is not available.
> 
> > My current plan is now to read those counters in perf top/record/report
> > to show (at least) the idle percentage for the current profile.
> 
> It's useful. Thanks for working on it. I was thinking about doing
> something similar for some time.

Hey Jiri,

what happened to this patch series? I also believe it's super useful, even 
when it's not yet perfect.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC 00/10] perf: Add cputime events/metrics

2018-09-26 Thread Milian Wolff
On Thursday, June 7, 2018 1:10:18 AM CEST Andi Kleen wrote:
> > I had some issues with IDLE counter being miscounted due to stopping
> > of the idle tick. I tried to solve it in this patch (it's part of the
> > 
> > patchset):
> >   perf/cputime: Don't stop idle tick if there's live cputime event
> > 
> > but I'm pretty sure it's wrong and there's better solution.
> 
> At least on intel we already have hardware counters for different idle
> states. You just would need to add them and convert to the same
> unit.
> 
> But of course it's still useful when this is not available.
> 
> > My current plan is now to read those counters in perf top/record/report
> > to show (at least) the idle percentage for the current profile.
> 
> It's useful. Thanks for working on it. I was thinking about doing
> something similar for some time.

Hey Jiri,

what happened to this patch series? I also believe it's super useful, even 
when it's not yet perfect.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/3] perf report: don't try to map ip to invalid map

2018-09-26 Thread Milian Wolff
On Wednesday, September 26, 2018 4:18:19 PM CEST Arnaldo Carvalho de Melo 
wrote:
> Em Wed, Sep 26, 2018 at 03:52:05PM +0200, Milian Wolff escreveu:
> > Fixes a crash when the report encounters an address that
> 
> > could not be associated with an mmaped region:
> Milian, can you spot which cset introduced this problem? So that we can
> add a "Fixes: sha" tag in this (and the others, if needed) to help the
> stable kernel maintainers to find which kernels this has to be
> backported to?

The issue was introduced by

perf script: Show correct offsets for DWARF-based unwinding

This in turn got backported already a few times, at which point the 
2a9d5050dc84fa2060f08a52f632976923e0fa7e sha was used when referencing the 
"Upstream commit".

Is that enough, or do you need me to find all the backported shas too?
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/3] perf report: don't try to map ip to invalid map

2018-09-26 Thread Milian Wolff
On Wednesday, September 26, 2018 4:18:19 PM CEST Arnaldo Carvalho de Melo 
wrote:
> Em Wed, Sep 26, 2018 at 03:52:05PM +0200, Milian Wolff escreveu:
> > Fixes a crash when the report encounters an address that
> 
> > could not be associated with an mmaped region:
> Milian, can you spot which cset introduced this problem? So that we can
> add a "Fixes: sha" tag in this (and the others, if needed) to help the
> stable kernel maintainers to find which kernels this has to be
> backported to?

The issue was introduced by

perf script: Show correct offsets for DWARF-based unwinding

This in turn got backported already a few times, at which point the 
2a9d5050dc84fa2060f08a52f632976923e0fa7e sha was used when referencing the 
"Upstream commit".

Is that enough, or do you need me to find all the backported shas too?
-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


[PATCH 1/3] perf report: don't try to map ip to invalid map

2018-09-26 Thread Milian Wolff
Fixes a crash when the report encounters an address that
could not be associated with an mmaped region:

#0  0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329
#1  unwind_entry (entry=entry@entry=0x7fff9180, 
arg=arg@entry=0x75642498) at util/machine.c:2329
#2  0x558370af in entry (arg=0x75642498, cb=0x557bdb50 
, thread=, ip=18446744073709551615) at 
util/unwind-libunwind-local.c:586
#3  get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 , 
arg=0x75642498, max_stack=) at 
util/unwind-libunwind-local.c:703
#4  0x55837192 in _unwind__get_entries (cb=, 
arg=, thread=, data=, 
max_stack=) at util/unwind-libunwind-local.c:725
#5  0x557c310f in thread__resolve_callchain_unwind (max_stack=127, 
sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, 
thread=0x55c7f6f0) at util/machine.c:2351
#6  thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, 
evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, 
root_al=0x7fff9750, max_stack=127) at util/machine.c:2378
#7  0x557ba4ee in sample__resolve_callchain (sample=, 
cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750,
max_stack=) at util/callchain.c:1085

Signed-off-by: Milian Wolff 
Cc: Sandipan Das 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c4acd2001db0..0cb4f8bf3ca7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 {
struct callchain_cursor *cursor = arg;
const char *srcline = NULL;
-   u64 addr;
+   u64 addr = entry->ip;
 
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
@@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
 */
-   addr = map__map_ip(entry->map, entry->ip);
+   if (entry->map)
+   addr = map__map_ip(entry->map, entry->ip);
 
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
-- 
2.19.0


[PATCH 1/3] perf report: don't try to map ip to invalid map

2018-09-26 Thread Milian Wolff
Fixes a crash when the report encounters an address that
could not be associated with an mmaped region:

#0  0x557bdc4a in callchain_srcline (ip=, sym=0x0, map=0x0) at util/machine.c:2329
#1  unwind_entry (entry=entry@entry=0x7fff9180, 
arg=arg@entry=0x75642498) at util/machine.c:2329
#2  0x558370af in entry (arg=0x75642498, cb=0x557bdb50 
, thread=, ip=18446744073709551615) at 
util/unwind-libunwind-local.c:586
#3  get_entries (ui=ui@entry=0x7fff9620, cb=0x557bdb50 , 
arg=0x75642498, max_stack=) at 
util/unwind-libunwind-local.c:703
#4  0x55837192 in _unwind__get_entries (cb=, 
arg=, thread=, data=, 
max_stack=) at util/unwind-libunwind-local.c:725
#5  0x557c310f in thread__resolve_callchain_unwind (max_stack=127, 
sample=0x7fff9830, evsel=0x55c7b3b0, cursor=0x75642498, 
thread=0x55c7f6f0) at util/machine.c:2351
#6  thread__resolve_callchain (thread=0x55c7f6f0, cursor=0x75642498, 
evsel=0x55c7b3b0, sample=0x7fff9830, parent=0x7fff97b8, 
root_al=0x7fff9750, max_stack=127) at util/machine.c:2378
#7  0x557ba4ee in sample__resolve_callchain (sample=, 
cursor=, parent=parent@entry=0x7fff97b8, evsel=, al=al@entry=0x7fff9750,
max_stack=) at util/callchain.c:1085

Signed-off-by: Milian Wolff 
Cc: Sandipan Das 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c4acd2001db0..0cb4f8bf3ca7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2312,7 +2312,7 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 {
struct callchain_cursor *cursor = arg;
const char *srcline = NULL;
-   u64 addr;
+   u64 addr = entry->ip;
 
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
@@ -2324,7 +2324,8 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
 */
-   addr = map__map_ip(entry->map, entry->ip);
+   if (entry->map)
+   addr = map__map_ip(entry->map, entry->ip);
 
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
-- 
2.19.0


[PATCH 2/3] perf report: use the offset address to find inline frames

2018-09-26 Thread Milian Wolff
To correctly find inlined frames, we have to use the file offset
instead of the virtual memory address. This was already fixed for
displaying srcline information while displaying in commit
2a9d5050dc84fa20 ("perf script: Show correct offsets for DWARF-based
unwinding"). We just need to use the same corrected address also when
trying to find inline frames.

This is another follow-up to commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

Signed-off-by: Milian Wolff 
Cc: Sandipan Das 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0cb4f8bf3ca7..73a651f10a0f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2317,9 +2317,6 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
 
-   if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
-   return 0;
-
/*
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
@@ -2327,6 +2324,9 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (entry->map)
addr = map__map_ip(entry->map, entry->ip);
 
+   if (append_inlines(cursor, entry->map, entry->sym, addr) == 0)
+   return 0;
+
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
   entry->map, entry->sym,
-- 
2.19.0


[PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-09-26 Thread Milian Wolff
When the function name for an inline frame is invalid, we must
not try to demangle this symbol, otherwise we crash with:

#0  0x55895c01 in bfd_demangle ()
#1  0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, 
kmodule=0) at util/symbol-elf.c:215
#2  dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, 
kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
#3  0x557fef4b in new_inline_sym (funcname=0x0, 
base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89
#4  inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, 
node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at 
util/srcline.c:264
#5  0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", 
addr=addr@entry=2888, file=file@entry=0x0,
line=line@entry=0x0, dso=dso@entry=0x55c7bb00, 
unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, 
sym=0x55d92b90) at util/srcline.c:313
#6  0x557ffe7c in addr2inlines (sym=0x55d92b90, dso=0x55c7bb00, 
addr=2888, dso_name=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
at util/srcline.c:358

So instead handle the case where we get invalid function names
for inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code,
I can now also reproduce it with trivial C++ code. Indeed, it seems
like libbfd fails to interpret the debug information for the inline
frame symbol name:

$ addr2line -e 
/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if 
b48
main
/usr/include/c++/8.2.1/complex:610
??
/usr/include/c++/8.2.1/complex:618
??
/usr/include/c++/8.2.1/complex:675
??
/usr/include/c++/8.2.1/complex:685
main
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there
which should fix this issue:
https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Hadrien Grasland 
---
 tools/perf/util/srcline.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 09d6746e6ec8..e767c4a9d4d2 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso,
struct symbol *inline_sym;
char *demangled = NULL;
 
+   if (!funcname)
+   funcname = "??";
+
if (dso) {
demangled = dso__demangle_sym(dso, 0, funcname);
if (demangled)
-- 
2.19.0


[PATCH 2/3] perf report: use the offset address to find inline frames

2018-09-26 Thread Milian Wolff
To correctly find inlined frames, we have to use the file offset
instead of the virtual memory address. This was already fixed for
displaying srcline information while displaying in commit
2a9d5050dc84fa20 ("perf script: Show correct offsets for DWARF-based
unwinding"). We just need to use the same corrected address also when
trying to find inline frames.

This is another follow-up to commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

Signed-off-by: Milian Wolff 
Cc: Sandipan Das 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/util/machine.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0cb4f8bf3ca7..73a651f10a0f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2317,9 +2317,6 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
 
-   if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
-   return 0;
-
/*
 * Convert entry->ip from a virtual address to an offset in
 * its corresponding binary.
@@ -2327,6 +2324,9 @@ static int unwind_entry(struct unwind_entry *entry, void 
*arg)
if (entry->map)
addr = map__map_ip(entry->map, entry->ip);
 
+   if (append_inlines(cursor, entry->map, entry->sym, addr) == 0)
+   return 0;
+
srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
   entry->map, entry->sym,
-- 
2.19.0


[PATCH 3/3] perf report: don't crash on invalid inline debug information

2018-09-26 Thread Milian Wolff
When the function name for an inline frame is invalid, we must
not try to demangle this symbol, otherwise we crash with:

#0  0x55895c01 in bfd_demangle ()
#1  0x55823262 in demangle_sym (dso=0x55d92b90, elf_name=0x0, 
kmodule=0) at util/symbol-elf.c:215
#2  dso__demangle_sym (dso=dso@entry=0x55d92b90, kmodule=, 
kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
#3  0x557fef4b in new_inline_sym (funcname=0x0, 
base_sym=0x55d92b90, dso=0x55d92b90) at util/srcline.c:89
#4  inline_list__append_dso_a2l (dso=dso@entry=0x55c7bb00, 
node=node@entry=0x55e31810, sym=sym@entry=0x55d92b90) at 
util/srcline.c:264
#5  0x557ff27f in addr2line (dso_name=dso_name@entry=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", 
addr=addr@entry=2888, file=file@entry=0x0,
line=line@entry=0x0, dso=dso@entry=0x55c7bb00, 
unwind_inlines=unwind_inlines@entry=true, node=0x55e31810, 
sym=0x55d92b90) at util/srcline.c:313
#6  0x557ffe7c in addr2inlines (sym=0x55d92b90, dso=0x55c7bb00, 
addr=2888, dso_name=0x55d92430 
"/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
at util/srcline.c:358

So instead handle the case where we get invalid function names
for inlined frames and use a fallback '??' function name instead.

While this crash was originally reported by Hadrien for rust code,
I can now also reproduce it with trivial C++ code. Indeed, it seems
like libbfd fails to interpret the debug information for the inline
frame symbol name:

$ addr2line -e 
/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if 
b48
main
/usr/include/c++/8.2.1/complex:610
??
/usr/include/c++/8.2.1/complex:618
??
/usr/include/c++/8.2.1/complex:675
??
/usr/include/c++/8.2.1/complex:685
main
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39

I've reported this bug upstream and also attached a patch there
which should fix this issue:
https://sourceware.org/bugzilla/show_bug.cgi?id=23715

Signed-off-by: Milian Wolff 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Hadrien Grasland 
---
 tools/perf/util/srcline.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 09d6746e6ec8..e767c4a9d4d2 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -85,6 +85,9 @@ static struct symbol *new_inline_sym(struct dso *dso,
struct symbol *inline_sym;
char *demangled = NULL;
 
+   if (!funcname)
+   funcname = "??";
+
if (dso) {
demangled = dso__demangle_sym(dso, 0, funcname);
if (demangled)
-- 
2.19.0


Re: [PATCH] perf script: Show correct offsets for DWARF-based unwinding

2018-07-25 Thread Milian Wolff
On Montag, 9. Juli 2018 16:25:07 CEST Jiri Olsa wrote:
> On Tue, Jul 03, 2018 at 05:35:55PM +0530, Sandipan Das wrote:
> 
> SNIP
> 
> > After:
> >   # perf report --stdio --no-children -s sym,srcline -g address
> >   
> >   # Samples: 1  of event 'probe_libc:inet_pton'
> >   # Event count (approx.): 1
> >   #
> >   # Overhead  SymbolSource:Line
> >   #     ...
> >   #
> >   
> >  100.00%  [.] __GI___inet_pton  inet_pton.c
> >  
> >   ---gaih_inet.constprop.7 getaddrinfo.c:537
> >   
> >  getaddrinfo getaddrinfo.c:2304
> >  main ping.c:519
> >  generic_start_main.isra.0 libc-start.c:308
> >  __libc_start_main libc-start.c:102
> >   
> >   ...
> >   
> >   # perf script -F comm,ip,sym,symoff,srcline,dso
> >   
> >   ping
> >   
> >   7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
> > 
> > inet_pton.c:68
> > 
> >   7fffb385fa53 gaih_inet.constprop.7+0xf43
> >   (/usr/lib64/libc-2.26.so)
> > 
> > getaddrinfo.c:537
> > 
> >   7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
> > 
> > getaddrinfo.c:2304
> > 
> >  130782d6f main+0x3df (/usr/bin/ping)
> > 
> > ping.c:519
> > 
> >   7fffb377369f generic_start_main.isra.0+0x13f
> >   (/usr/lib64/libc-2.26.so)
> > 
> > libc-start.c:308
> > 
> >   7fffb3773897 __libc_start_main+0xb7
> >   (/usr/lib64/libc-2.26.so)
> > 
> > libc-start.c:102
> > 
> > Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip
> > for entries") Signed-off-by: Sandipan Das 
> 
> looks good to me, Milian?
> 
> Acked-by: Jiri Olsa 

Sorry for the delay, I was on vacation.

The above looks somewhat strange to me - why is there no `(inlined)` suffix 
visible anymore?

Also, I can't test this patch locally, since - even without this patch - 
inline frame resolution with perf seems to be completely broken for me. It 
doesn't seem to be a perf regression - going back in time doesn't resole this 
- but rather of its dependencies or even of the DWARF emitted by the compilers 
I have available to test...

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf script: Show correct offsets for DWARF-based unwinding

2018-07-25 Thread Milian Wolff
On Montag, 9. Juli 2018 16:25:07 CEST Jiri Olsa wrote:
> On Tue, Jul 03, 2018 at 05:35:55PM +0530, Sandipan Das wrote:
> 
> SNIP
> 
> > After:
> >   # perf report --stdio --no-children -s sym,srcline -g address
> >   
> >   # Samples: 1  of event 'probe_libc:inet_pton'
> >   # Event count (approx.): 1
> >   #
> >   # Overhead  SymbolSource:Line
> >   #     ...
> >   #
> >   
> >  100.00%  [.] __GI___inet_pton  inet_pton.c
> >  
> >   ---gaih_inet.constprop.7 getaddrinfo.c:537
> >   
> >  getaddrinfo getaddrinfo.c:2304
> >  main ping.c:519
> >  generic_start_main.isra.0 libc-start.c:308
> >  __libc_start_main libc-start.c:102
> >   
> >   ...
> >   
> >   # perf script -F comm,ip,sym,symoff,srcline,dso
> >   
> >   ping
> >   
> >   7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
> > 
> > inet_pton.c:68
> > 
> >   7fffb385fa53 gaih_inet.constprop.7+0xf43
> >   (/usr/lib64/libc-2.26.so)
> > 
> > getaddrinfo.c:537
> > 
> >   7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
> > 
> > getaddrinfo.c:2304
> > 
> >  130782d6f main+0x3df (/usr/bin/ping)
> > 
> > ping.c:519
> > 
> >   7fffb377369f generic_start_main.isra.0+0x13f
> >   (/usr/lib64/libc-2.26.so)
> > 
> > libc-start.c:308
> > 
> >   7fffb3773897 __libc_start_main+0xb7
> >   (/usr/lib64/libc-2.26.so)
> > 
> > libc-start.c:102
> > 
> > Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip
> > for entries") Signed-off-by: Sandipan Das 
> 
> looks good to me, Milian?
> 
> Acked-by: Jiri Olsa 

Sorry for the delay, I was on vacation.

The above looks somewhat strange to me - why is there no `(inlined)` suffix 
visible anymore?

Also, I can't test this patch locally, since - even without this patch - 
inline frame resolution with perf seems to be completely broken for me. It 
doesn't seem to be a perf regression - going back in time doesn't resole this 
- but rather of its dependencies or even of the DWARF emitted by the compilers 
I have available to test...

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] perf/core: exposing type of context-switch-out event

2018-03-01 Thread Milian Wolff
On Donnerstag, 1. März 2018 20:36:59 CET Andi Kleen wrote:
> > Please also add documentation Documentation/perf.data-file-format.txt, but
> > I just noticed that not even PERF_RECORD_SWITCH is documented there...
>
> That file only covers fields not generated by the kernel, but this
> is coming from the kernel.
> 
> Kernel records are documented in the manpage, but Vince usually updates
> that on his own.

Ah, TIL - thanks for that tip! But I still think it would be good to have a 
complete documentation of the perf.data file format in one place. I guess 
patches would be welcome to add more aspects of the file format there, even if 
it's generated by the kernel? That helps for thirdparty tools that parse the 
perf.data files (like perfparser used by QtCreator and hotspot).

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] perf/core: exposing type of context-switch-out event

2018-03-01 Thread Milian Wolff
On Donnerstag, 1. März 2018 20:36:59 CET Andi Kleen wrote:
> > Please also add documentation Documentation/perf.data-file-format.txt, but
> > I just noticed that not even PERF_RECORD_SWITCH is documented there...
>
> That file only covers fields not generated by the kernel, but this
> is coming from the kernel.
> 
> Kernel records are documented in the manpage, but Vince usually updates
> that on his own.

Ah, TIL - thanks for that tip! But I still think it would be good to have a 
complete documentation of the perf.data file format in one place. I guess 
patches would be welcome to add more aspects of the file format there, even if 
it's generated by the kernel? That helps for thirdparty tools that parse the 
perf.data files (like perfparser used by QtCreator and hotspot).

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] perf/core: exposing type of context-switch-out event

2018-03-01 Thread Milian Wolff
On Donnerstag, 1. März 2018 19:08:05 CET Andi Kleen wrote:
> On Thu, Mar 01, 2018 at 06:40:04PM +0300, Alexey Budankov wrote:
> > Hi,
> > 
> > This patch prototypes exposing the type of context-switch-out event using
> > PERF_RECORD_MISC_EXT_RESERVED bit for PERF_RECORD_SWITCH[_CPU_WIDE]
> > records.
> It would be better to define an actually named bit in perf_event.h.
> It can be the same value.
> 
> Also we would need a patch for perf script / perf report -D to print this
> information.
> 
> The rest looks good to me.

Please also add documentation Documentation/perf.data-file-format.txt, but I 
just noticed that not even PERF_RECORD_SWITCH is documented there...

Otherwise I also think that this would be a very nice feature addition!

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH] perf/core: exposing type of context-switch-out event

2018-03-01 Thread Milian Wolff
On Donnerstag, 1. März 2018 19:08:05 CET Andi Kleen wrote:
> On Thu, Mar 01, 2018 at 06:40:04PM +0300, Alexey Budankov wrote:
> > Hi,
> > 
> > This patch prototypes exposing the type of context-switch-out event using
> > PERF_RECORD_MISC_EXT_RESERVED bit for PERF_RECORD_SWITCH[_CPU_WIDE]
> > records.
> It would be better to define an actually named bit in perf_event.h.
> It can be the same value.
> 
> Also we would need a patch for perf script / perf report -D to print this
> information.
> 
> The rest looks good to me.

Please also add documentation Documentation/perf.data-file-format.txt, but I 
just noticed that not even PERF_RECORD_SWITCH is documented there...

Otherwise I also think that this would be a very nice feature addition!

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

2018-01-16 Thread Milian Wolff
On Tuesday, January 16, 2018 1:40:38 PM CET Cheng Jian wrote:
> When i use perf to trace the sched_wakeup_new tracepoint, there is
> a bug that output the same event repetitiously.
> It can be reproduced by :
> 
>   #./test_fork
>   parent pid : 1059
>   child pid : 1060
>   #perf record -e sched:sched_wakeup_new -p 1060
> 
> test_fork is an demo that can generating wakeup_new event, parent
> process does nothing but fork a child process, and then they both
> quit.
> 
> There are 4 processors in this machine. before this patch,
> perf script(perf-1058, parent-1059, child-1060) :
> 
> test_fork  1059 [001]62.913689: sched:sched_wakeup_new:
> comm=test_fork pid=1060 prio=120 target_cpu=002 test_fork  1059 [001]   
> 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120
> target_cpu=002 test_fork  1059 [001]62.913705: sched:sched_wakeup_new:
> comm=test_fork pid=1060 prio=120 target_cpu=002
> 
> but ftrace report this event only once :
> 
>   test_fork-1059  [002] d...   62.913680: sched_wakeup_new: comm=test_fork
> pid=1060 prio=120 target_cpu=002
> 
> perf script print the same wakeup_new event multiple times.
> 
> These events which trigger this issue all specify a target process.
> commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
> for events") has designed a method to trace these events. For
> example, the sched_wakeup and sched_wakeup_new tracepoint will be
> caught when the current task wakeup a target task.
> 
> These events are registered as per cpu most of the time and attached
> to the task too, we will get all of them from the perf_event_context
> of this task, they will be matched success but are all the same event.
> So check the cpu number of this event to avoid matching them multiple
> times.
> 
> after this patch, perf script(parent-1040, child-1041):
> 
>   test_fork  1040 [002]36.536079: sched:sched_wakeup_new: 
> comm=test_fork
> pid=1041 prio=120 target_cpu=003
> 
> It will match it only once for tracing task(child-1041).

Oh, this sounds awesome. I don't have the setup available to compile a kernel 
with this patch applied, but I think from the description it solves a long-
standing issue with perf's sleep-time profiling.

Can someone try this please:
https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times

Use 'sleep 1' as the debuggee. On my system, I get the period multiplied by 
nproc like you describe:

```
$ perf-sleep-record sleep 1
..
$ perf report --stdio --show-total-period | grep "Event count"
..
# Event count (approx.): 8000845488
$ nproc
8
```

The sleep-record script is available at: 
https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record

I believe your patch also fixes the sched_stat_* tracepoints to be only 
emitted once per CPU. Can you verify this? I.e. is the period finally 
correctly calculated and we get a value of roughly 1E9ns == 1s?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v3] perf/trace : Fix repetitious traces of perf on tracepoint

2018-01-16 Thread Milian Wolff
On Tuesday, January 16, 2018 1:40:38 PM CET Cheng Jian wrote:
> When i use perf to trace the sched_wakeup_new tracepoint, there is
> a bug that output the same event repetitiously.
> It can be reproduced by :
> 
>   #./test_fork
>   parent pid : 1059
>   child pid : 1060
>   #perf record -e sched:sched_wakeup_new -p 1060
> 
> test_fork is an demo that can generating wakeup_new event, parent
> process does nothing but fork a child process, and then they both
> quit.
> 
> There are 4 processors in this machine. before this patch,
> perf script(perf-1058, parent-1059, child-1060) :
> 
> test_fork  1059 [001]62.913689: sched:sched_wakeup_new:
> comm=test_fork pid=1060 prio=120 target_cpu=002 test_fork  1059 [001]   
> 62.913698: sched:sched_wakeup_new: comm=test_fork pid=1060 prio=120
> target_cpu=002 test_fork  1059 [001]62.913705: sched:sched_wakeup_new:
> comm=test_fork pid=1060 prio=120 target_cpu=002
> 
> but ftrace report this event only once :
> 
>   test_fork-1059  [002] d...   62.913680: sched_wakeup_new: comm=test_fork
> pid=1060 prio=120 target_cpu=002
> 
> perf script print the same wakeup_new event multiple times.
> 
> These events which trigger this issue all specify a target process.
> commit e6dab5ffab59 ("perf/trace: Add ability to set a target task
> for events") has designed a method to trace these events. For
> example, the sched_wakeup and sched_wakeup_new tracepoint will be
> caught when the current task wakeup a target task.
> 
> These events are registered as per cpu most of the time and attached
> to the task too, we will get all of them from the perf_event_context
> of this task, they will be matched success but are all the same event.
> So check the cpu number of this event to avoid matching them multiple
> times.
> 
> after this patch, perf script(parent-1040, child-1041):
> 
>   test_fork  1040 [002]36.536079: sched:sched_wakeup_new: 
> comm=test_fork
> pid=1041 prio=120 target_cpu=003
> 
> It will match it only once for tracing task(child-1041).

Oh, this sounds awesome. I don't have the setup available to compile a kernel 
with this patch applied, but I think from the description it solves a long-
standing issue with perf's sleep-time profiling.

Can someone try this please:
https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times

Use 'sleep 1' as the debuggee. On my system, I get the period multiplied by 
nproc like you describe:

```
$ perf-sleep-record sleep 1
..
$ perf report --stdio --show-total-period | grep "Event count"
..
# Event count (approx.): 8000845488
$ nproc
8
```

The sleep-record script is available at: 
https://github.com/milianw/shell-helpers/blob/master/perf-sleep-record

I believe your patch also fixes the sched_stat_* tracepoints to be only 
emitted once per CPU. Can you verify this? I.e. is the period finally 
correctly calculated and we get a value of roughly 1E9ns == 1s?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH AUTOSEL for 4.14 18/51] perf callchain: Compare symbol name for inlined frames when matching

2017-11-23 Thread Milian Wolff
On Wednesday, November 22, 2017 11:25:40 PM CET alexander.le...@verizon.com 
wrote:
> From: Milian Wolff <milian.wo...@kdab.com>
> 
> [ Upstream commit 9856240ad3269f2fdab0b2fa4400ef8aab792061 ]

Hello Alexander,

this is the first time I encounter AUTOSEL. I just want to check: The patch 
below depends on others in a whole series that reworks the handling of inline 
frames. Why is only this one getting selected? I don't even think it can work 
stand-alone?

Thanks

> The fake symbols we create for inlined frames will represent different
> functions but can use the symbol start address. This leads to issues
> when different inline branches all lead to the same function.
> 
> Before:
> ~
> $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
> ...
>  --38.86%--_start
>__libc_start_main
>main
> 
> --37.57%--std::norm (inlined)
>   std::_Norm_helper::_S_do_it
> (inlined)
> 
>--36.36%--std::abs (inlined)
>  std::__complex_abs (inlined)
> 
>  
> --12.24%--std::linear_congruential_engine 2147483647ul>::operator() (inlined) std::__detail::__mod 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~
> 
> Note that this backtrace representation is completely bogus.
> Complex abs does not call the linear congruential engine! It
> is just a side-effect of a longer inlined stack being appended
> to a shorter, different inlined stack, both of which originate
> in the same function (main).
> 
> This patch fixes the issue:
> 
> ~
> $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
> ...
>  --38.86%--_start
>__libc_start_main
>main
> 
>|--35.59%--std::uniform_real_distribution::op
>|erator()<std::linear_congruential_engine|long, 16807ul, 0ul, 2147483647ul> > (inlined) 
>   
|
>|  std::uniform_real_distribution::op
>|  erator()<std::linear_congruential_engine|  nsigned long, 16807ul, 0ul, 2147483647ul>
>|  > (inlined)   | 
>  
>|   --34.37%--std::__detail::_Adaptor|   near_congruential_engine|   16807ul, 0ul, 2147483647ul>,
>|   double>::operator() (inlined)  
>  
|   
>| std::generate_canonical<double,
>| 53ul,
>| std::linear_congruential_engin
>| e| 2147483647ul> > (inlined)
>
| 
>|  --12.24%--std::linear_congruen
>|  tial_engine|  16807ul, 0ul,
>|  2147483647ul>::operator()
>|  (inlined)   
|  
>|std::__detail::__mod
>||2147483647ul,
>|16807ul, 0ul>
>|(inlined)
>|std::__detail::_Mod<
>|unsigned long,
>|2147483647ul,
>|16807ul, 0ul, true,
>|true>::__calc
>|        (inlined)
> --1.99%--std::norm (inlined)
>   std::_Norm_helper::_S_do_it
> (inlined) std::abs (inlined)
>   std::__complex_abs (inlined)
> ~
> 
> Signed-off-by: Milian Wolff <milian.wo...@kdab.com>
> Reviewed-by: Jiri Olsa <jo...@redhat.com>
> Reviewed-by

Re: [PATCH AUTOSEL for 4.14 18/51] perf callchain: Compare symbol name for inlined frames when matching

2017-11-23 Thread Milian Wolff
On Wednesday, November 22, 2017 11:25:40 PM CET alexander.le...@verizon.com 
wrote:
> From: Milian Wolff 
> 
> [ Upstream commit 9856240ad3269f2fdab0b2fa4400ef8aab792061 ]

Hello Alexander,

this is the first time I encounter AUTOSEL. I just want to check: The patch 
below depends on others in a whole series that reworks the handling of inline 
frames. Why is only this one getting selected? I don't even think it can work 
stand-alone?

Thanks

> The fake symbols we create for inlined frames will represent different
> functions but can use the symbol start address. This leads to issues
> when different inline branches all lead to the same function.
> 
> Before:
> ~
> $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
> ...
>  --38.86%--_start
>__libc_start_main
>main
> 
> --37.57%--std::norm (inlined)
>   std::_Norm_helper::_S_do_it
> (inlined)
> 
>--36.36%--std::abs (inlined)
>  std::__complex_abs (inlined)
> 
>  
> --12.24%--std::linear_congruential_engine 2147483647ul>::operator() (inlined) std::__detail::__mod 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~
> 
> Note that this backtrace representation is completely bogus.
> Complex abs does not call the linear congruential engine! It
> is just a side-effect of a longer inlined stack being appended
> to a shorter, different inlined stack, both of which originate
> in the same function (main).
> 
> This patch fixes the issue:
> 
> ~
> $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
> ...
>  --38.86%--_start
>__libc_start_main
>main
> 
>|--35.59%--std::uniform_real_distribution::op
>|erator()|long, 16807ul, 0ul, 2147483647ul> > (inlined) 
>   
|
>|  std::uniform_real_distribution::op
>|  erator()|  nsigned long, 16807ul, 0ul, 2147483647ul>
>|  > (inlined)   | 
>  
>|   --34.37%--std::__detail::_Adaptor|   near_congruential_engine|   16807ul, 0ul, 2147483647ul>,
>|   double>::operator() (inlined)  
>  
|   
>| std::generate_canonical| 53ul,
>| std::linear_congruential_engin
>| e| 2147483647ul> > (inlined)
>
| 
>|  --12.24%--std::linear_congruen
>|  tial_engine|  16807ul, 0ul,
>|  2147483647ul>::operator()
>|  (inlined)   
|  
>|std::__detail::__mod
>||2147483647ul,
>|16807ul, 0ul>
>|(inlined)
>|std::__detail::_Mod<
>|unsigned long,
>|2147483647ul,
>|16807ul, 0ul, true,
>|true>::__calc
>|(inlined)
>     --1.99%--std::norm (inlined)
>   std::_Norm_helper::_S_do_it
> (inlined) std::abs (inlined)
>   std::__complex_abs (inlined)
> ~
> 
> Signed-off-by: Milian Wolff 
> Reviewed-by: Jiri Olsa 
> Reviewed-by: Namhyung Kim 
> Cc: David Ahern 
> Cc: Peter Zijlstra 
> Cc: Ravi Bangoria 
> Cc: Yao Jin 
> Link: http://lkml.kernel.org/r/20171009203310.17362-10-m

Re: [RFC] perf script: modify field selection option

2017-11-20 Thread Milian Wolff
On Montag, 20. November 2017 21:53:04 CET Stephane Eranian wrote:
> Hi,
> 
> I have been using the perf script -F option on the latest perf and I
> find it not very convenient to use. I appreciate the + and - prefix to
> field names to add or suppress them. But most of the time, I want to
> print only one or two fields and I have to guess which ones are there
> by default so I can suppress them. I think there should be a way to
> say: start from no fields. I understand why you have default to
> maintain compatibility with older perf script but I would like a
> syntax to say: remove defaults. For instance:
> 
> $ perf script -F --,+ip,+syms .
> 
> Where -- would mean drop all defaults.
> 
> Any better suggestions?

Isn't `perf script -F ip,sym` what you want? Note the lack of any '+':

$ perf script -F ip,sym | head -n 5

  206aad x86_pmu_enable
  380591 ctx_resched
  380b46 __perf_event_enable
      378716 event_function

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts




Re: [RFC] perf script: modify field selection option

2017-11-20 Thread Milian Wolff
On Montag, 20. November 2017 21:53:04 CET Stephane Eranian wrote:
> Hi,
> 
> I have been using the perf script -F option on the latest perf and I
> find it not very convenient to use. I appreciate the + and - prefix to
> field names to add or suppress them. But most of the time, I want to
> print only one or two fields and I have to guess which ones are there
> by default so I can suppress them. I think there should be a way to
> say: start from no fields. I understand why you have default to
> maintain compatibility with older perf script but I would like a
> syntax to say: remove defaults. For instance:
> 
> $ perf script -F --,+ip,+syms .
> 
> Where -- would mean drop all defaults.
> 
> Any better suggestions?

Isn't `perf script -F ip,sym` what you want? Note the lack of any '+':

$ perf script -F ip,sym | head -n 5

  206aad x86_pmu_enable
  380591 ctx_resched
  380b46 __perf_event_enable
      378716 event_function

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts




Re: [GIT PULL 00/15] perf/core inlining improvements

2017-10-26 Thread Milian Wolff
On Mittwoch, 25. Oktober 2017 17:59:58 CEST Arnaldo Carvalho de Melo wrote:
> Hi Ingo,
> 
>   Please consider pulling, this is Milian's v7 plus some fixes
> acked by Namhyung after some discussion among the three of us, I
> probably need to pick some more patches that are related to this area,
> but lets make some progress and merge this kit,

Thanks a lot for everyone involved in reviewing this series. Much appreciated!

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts




Re: [GIT PULL 00/15] perf/core inlining improvements

2017-10-26 Thread Milian Wolff
On Mittwoch, 25. Oktober 2017 17:59:58 CEST Arnaldo Carvalho de Melo wrote:
> Hi Ingo,
> 
>   Please consider pulling, this is Milian's v7 plus some fixes
> acked by Namhyung after some discussion among the three of us, I
> probably need to pick some more patches that are related to this area,
> but lets make some progress and merge this kit,

Thanks a lot for everyone involved in reviewing this series. Much appreciated!

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts




[tip:perf/core] perf util: Enable handling of inlined frames by default

2017-10-25 Thread tip-bot for Milian Wolff
Commit-ID:  d8a88dd243a170a226aba33e7c53704db2f82aa6
Gitweb: https://git.kernel.org/tip/d8a88dd243a170a226aba33e7c53704db2f82aa6
Author: Milian Wolff <milian.wo...@kdab.com>
AuthorDate: Thu, 19 Oct 2017 13:38:36 +0200
Committer:  Arnaldo Carvalho de Melo <a...@redhat.com>
CommitDate: Wed, 25 Oct 2017 10:50:47 -0300

perf util: Enable handling of inlined frames by default

Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable this
useful option by default.

Suggested-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Milian Wolff <milian.wo...@kdab.com>
Reviewed-by: Andi Kleen <a...@linux.intel.com>
Cc: David Ahern <dsah...@gmail.com>
Cc: Jin Yao <yao@linux.intel.com>
Cc: Jiri Olsa <jo...@kernel.org>
Cc: Namhyung Kim <namhy...@kernel.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 3 ++-
 tools/perf/Documentation/perf-script.txt | 3 ++-
 tools/perf/util/symbol.c | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 383a98d..ddde2b5 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -434,7 +434,8 @@ include::itrace.txt[]
 
 --inline::
If a callgraph address belongs to an inlined function, the inline stack
-   will be printed. Each entry is function name or file/line.
+   will be printed. Each entry is function name or file/line. Enabled by
+   default, disable with --no-inline.
 
 include::callchain-overhead-calculation.txt[]
 
diff --git a/tools/perf/Documentation/perf-script.txt 
b/tools/perf/Documentation/perf-script.txt
index bcc1ba3..25e6773 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]
 
 --inline::
If a callgraph address belongs to an inlined function, the inline stack
-   will be printed. Each entry has function name and file/line.
+   will be printed. Each entry has function name and file/line. Enabled by
+   default, disable with --no-inline.
 
 SEE ALSO
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 066e38a..ce6993b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
.show_hist_headers  = true,
.symfs  = "",
.event_group= true,
+   .inline_name= true,
 };
 
 static enum dso_binary_type binary_type_symtab[] = {


  1   2   3   4   5   6   >