On Tue, May 16, 2017 at 06:26:47PM +0200, Milian Wolff wrote: > On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote: > > On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote: > > > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote: > > > > As the documentation for dwfl_frame_pc says, frames that > > > > are no activation frames need to have their program counter > > > > decremented by one to properly find the function of the caller. > > > > > > > > This fixes many cases where perf report currently attributes > > > > the cost to the next line. I.e. I have code like this: > > > > > > > > ~~~~~~~~~~~~~~~ > > > > > > > > #include <thread> > > > > #include <chrono> > > > > > > > > using namespace std; > > > > > > > > int main() > > > > { > > > > > > > > this_thread::sleep_for(chrono::milliseconds(1000)); > > > > this_thread::sleep_for(chrono::milliseconds(100)); > > > > this_thread::sleep_for(chrono::milliseconds(10)); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > ~~~~~~~~~~~~~~~ > > > > > > It'd be nice if the test program has a signal frame for verification. > > > > I have pretty much zero experience about signals. Would it be enough to add > > a signal handler for, say, SIGUSR1 to my test application and then trigger > > a sleep when that signal is delivered? If that should be enough, I'll write > > and test it out. > > ~~~~~~~~~~~~~~~ > #include <thread> > #include <chrono> > #include <signal.h> > > using namespace std; > > volatile bool run_loop = true; > > void my_handler(int signum) > { > this_thread::sleep_for(chrono::milliseconds(1000)); > this_thread::sleep_for(chrono::milliseconds(100)); > this_thread::sleep_for(chrono::milliseconds(10)); > run_loop = false; > } > > int main() > { > signal(SIGUSR1, my_handler); > > while (run_loop) {} > > return 0; > } > ~~~~~~~~~~~~~~~ > > This does not properly unwind neither before nor after this patch. I only > ever > get: > > 100.00% core.c:0 > | > ---__schedule core.c:0 > schedule > do_nanosleep hrtimer.c:0 > hrtimer_nanosleep > sys_nanosleep > entry_SYSCALL_64_fastpath .tmp_entry_64.o:0 > __nanosleep_nocancel .:0 > std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > > thread:323 > > So... should this work? Please tell me how to test this properly.
How did you send the SIGUSR1 to the process? Anyway it does nothing to do with inlining, right? I just wrote a test code below to burn a cpu with or without a signal frame. $ nl -ba frame-addr.c 1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <signal.h> 4 5 #define __noinline __attribute__((noinline)) 6 7 __noinline void bar(void) 8 { 9 volatile long cnt = 0; 10 11 for (cnt = 0; cnt < 100000000; cnt++); 12 } 13 14 __noinline void foo(void) 15 { 16 bar(); 17 } 18 19 void sig_handler(int sig) 20 { 21 foo(); 22 } 23 24 int main(void) 25 { 26 signal(SIGUSR1, sig_handler); 27 raise(SIGUSR1); 28 29 foo(); 30 return 0; 31 } $ gcc -O2 -g -fno-optimize-sibling-calls -o frame-addr frame-addr.c $ perf record --call-graph dwarf ./frame-addr $ perf report -q -g srcline | head -15 99.88% frame-addr frame-addr [.] bar | ---bar frame-addr.c:11 foo frame-addr.c:16 | |--51.12%--main frame-addr.c:29 | __libc_start_main | _start | --48.76%--sig_handler frame-addr.c:21 0x33a8f raise .:0 main frame-addr.c:29 <--- bad __libc_start_main _start Note that 'raise' was called at line 27. It seems that simply checking current frame fixes it. Thanks, Namhyung