[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

2016-02-24 Thread tip-bot for Josh Poimboeuf
Commit-ID:  821eae7d14f0bbf69df1cc4656c54900b2672928
Gitweb: http://git.kernel.org/tip/821eae7d14f0bbf69df1cc4656c54900b2672928
Author: Josh Poimboeuf 
AuthorDate: Thu, 18 Feb 2016 11:41:58 -0600
Committer:  Ingo Molnar 
CommitDate: Wed, 24 Feb 2016 08:35:45 +0100

sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

If __preempt_schedule() or __preempt_schedule_notrace() is referenced at
the beginning of a function, gcc can insert the asm inline "call
___preempt_schedule[_notrace]" instruction before setting up a stack
frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is
enabled and can result in bad stack traces.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statements.

Specifically this fixes the following stacktool warnings:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame 
pointer save/setup
  stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without 
frame pointer save/setup
  stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without 
frame pointer save/setup
  stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call 
without frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call 
without frame pointer save/setup

So it only adds a stack frame to 15 call sites out of ~5000 calls to
___preempt_schedule[_notrace]().  All the others already had stack frames.

Oddly, this change actually seems to make things faster in a lot of
cases.  For many smaller functions it causes the stack frame creation to
get moved out of the common path and into the unlikely path.

For example, here's the original cyc2ns_read_end():

  8101f8c0 :
  8101f8c0: 55  push   %rbp
  8101f8c1: 48 89 e5mov%rsp,%rbp
  8101f8c4: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c8: 75 08   jne8101f8d2 

  8101f8ca: 65 48 89 3d e6 5a ffmov
%rdi,%gs:0x7eff5ae6(%rip)# 153b8 
  8101f8d1: 7e
  8101f8d2: 65 ff 0d 77 c4 fe 7edecl   %gs:0x7efec477(%rip) 
   # bd50 <__preempt_count>
  8101f8d9: 74 02   je 8101f8dd 

  8101f8db: 5d  pop%rbp
  8101f8dc: c3  retq
  8101f8dd: e8 1e 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e2: 5d  pop%rbp
  8101f8e3: c3  retq
  8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8eb: 00 00 00 00 00

And here's the same function with the patch:

  8101f8c0 :
  8101f8c0: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c4: 75 08   jne8101f8ce 

  8101f8c6: 65 48 89 3d ea 5a ffmov
%rdi,%gs:0x7eff5aea(%rip)# 153b8 
  8101f8cd: 7e
  8101f8ce: 65 ff 0d 7b c4 fe 7edecl   %gs:0x7efec47b(%rip) 
   # bd50 <__preempt_count>
  8101f8d5: 74 01   je 8101f8d8 

  8101f8d7: c3  retq
  8101f8d8: 55  push   %rbp
  8101f8d9: 48 89 e5mov%rsp,%rbp
  8101f8dc: e8 1f 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e1: 5d  pop%rbp
  8101f8e2: c3  retq
  8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw 

[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

2016-02-24 Thread tip-bot for Josh Poimboeuf
Commit-ID:  821eae7d14f0bbf69df1cc4656c54900b2672928
Gitweb: http://git.kernel.org/tip/821eae7d14f0bbf69df1cc4656c54900b2672928
Author: Josh Poimboeuf 
AuthorDate: Thu, 18 Feb 2016 11:41:58 -0600
Committer:  Ingo Molnar 
CommitDate: Wed, 24 Feb 2016 08:35:45 +0100

sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

If __preempt_schedule() or __preempt_schedule_notrace() is referenced at
the beginning of a function, gcc can insert the asm inline "call
___preempt_schedule[_notrace]" instruction before setting up a stack
frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is
enabled and can result in bad stack traces.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statements.

Specifically this fixes the following stacktool warnings:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame 
pointer save/setup
  stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without 
frame pointer save/setup
  stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without 
frame pointer save/setup
  stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call 
without frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call 
without frame pointer save/setup

So it only adds a stack frame to 15 call sites out of ~5000 calls to
___preempt_schedule[_notrace]().  All the others already had stack frames.

Oddly, this change actually seems to make things faster in a lot of
cases.  For many smaller functions it causes the stack frame creation to
get moved out of the common path and into the unlikely path.

For example, here's the original cyc2ns_read_end():

  8101f8c0 :
  8101f8c0: 55  push   %rbp
  8101f8c1: 48 89 e5mov%rsp,%rbp
  8101f8c4: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c8: 75 08   jne8101f8d2 

  8101f8ca: 65 48 89 3d e6 5a ffmov
%rdi,%gs:0x7eff5ae6(%rip)# 153b8 
  8101f8d1: 7e
  8101f8d2: 65 ff 0d 77 c4 fe 7edecl   %gs:0x7efec477(%rip) 
   # bd50 <__preempt_count>
  8101f8d9: 74 02   je 8101f8dd 

  8101f8db: 5d  pop%rbp
  8101f8dc: c3  retq
  8101f8dd: e8 1e 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e2: 5d  pop%rbp
  8101f8e3: c3  retq
  8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8eb: 00 00 00 00 00

And here's the same function with the patch:

  8101f8c0 :
  8101f8c0: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c4: 75 08   jne8101f8ce 

  8101f8c6: 65 48 89 3d ea 5a ffmov
%rdi,%gs:0x7eff5aea(%rip)# 153b8 
  8101f8cd: 7e
  8101f8ce: 65 ff 0d 7b c4 fe 7edecl   %gs:0x7efec47b(%rip) 
   # bd50 <__preempt_count>
  8101f8d5: 74 01   je 8101f8d8 

  8101f8d7: c3  retq
  8101f8d8: 55  push   %rbp
  8101f8d9: 48 89 e5mov%rsp,%rbp
  8101f8dc: e8 1f 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e1: 5d  pop%rbp
  8101f8e2: c3  retq
  8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8ea: 84 00 00 00 00 00

Notice that it moved the frame pointer setup code to the unlikely
___preempt_schedule() call 

[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

2016-02-23 Thread tip-bot for Josh Poimboeuf
Commit-ID:  b5429dac54a31359e508add8572ebe8d29b8cbdb
Gitweb: http://git.kernel.org/tip/b5429dac54a31359e508add8572ebe8d29b8cbdb
Author: Josh Poimboeuf 
AuthorDate: Thu, 18 Feb 2016 11:41:58 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 23 Feb 2016 09:04:01 +0100

sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

If __preempt_schedule() or __preempt_schedule_notrace() is referenced at
the beginning of a function, gcc can insert the asm inline "call
___preempt_schedule[_notrace]" instruction before setting up a stack
frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is
enabled and can result in bad stack traces.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statements.

Specifically this fixes the following stacktool warnings:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame 
pointer save/setup
  stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without 
frame pointer save/setup
  stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without 
frame pointer save/setup
  stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call 
without frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call 
without frame pointer save/setup

So it only adds a stack frame to 15 call sites out of ~5000 calls to
___preempt_schedule[_notrace]().  All the others already had stack frames.

Oddly, this change actually seems to make things faster in a lot of
cases.  For many smaller functions it causes the stack frame creation to
get moved out of the common path and into the unlikely path.

For example, here's the original cyc2ns_read_end():

  8101f8c0 :
  8101f8c0: 55  push   %rbp
  8101f8c1: 48 89 e5mov%rsp,%rbp
  8101f8c4: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c8: 75 08   jne8101f8d2 

  8101f8ca: 65 48 89 3d e6 5a ffmov
%rdi,%gs:0x7eff5ae6(%rip)# 153b8 
  8101f8d1: 7e
  8101f8d2: 65 ff 0d 77 c4 fe 7edecl   %gs:0x7efec477(%rip) 
   # bd50 <__preempt_count>
  8101f8d9: 74 02   je 8101f8dd 

  8101f8db: 5d  pop%rbp
  8101f8dc: c3  retq
  8101f8dd: e8 1e 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e2: 5d  pop%rbp
  8101f8e3: c3  retq
  8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8eb: 00 00 00 00 00

And here's the same function with the patch:

  8101f8c0 :
  8101f8c0: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c4: 75 08   jne8101f8ce 

  8101f8c6: 65 48 89 3d ea 5a ffmov
%rdi,%gs:0x7eff5aea(%rip)# 153b8 
  8101f8cd: 7e
  8101f8ce: 65 ff 0d 7b c4 fe 7edecl   %gs:0x7efec47b(%rip) 
   # bd50 <__preempt_count>
  8101f8d5: 74 01   je 8101f8d8 

  8101f8d7: c3  retq
  8101f8d8: 55  push   %rbp
  8101f8d9: 48 89 e5mov%rsp,%rbp
  8101f8dc: e8 1f 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e1: 5d  pop%rbp
  8101f8e2: c3  retq
  8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw 

[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

2016-02-23 Thread tip-bot for Josh Poimboeuf
Commit-ID:  b5429dac54a31359e508add8572ebe8d29b8cbdb
Gitweb: http://git.kernel.org/tip/b5429dac54a31359e508add8572ebe8d29b8cbdb
Author: Josh Poimboeuf 
AuthorDate: Thu, 18 Feb 2016 11:41:58 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 23 Feb 2016 09:04:01 +0100

sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()

If __preempt_schedule() or __preempt_schedule_notrace() is referenced at
the beginning of a function, gcc can insert the asm inline "call
___preempt_schedule[_notrace]" instruction before setting up a stack
frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is
enabled and can result in bad stack traces.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statements.

Specifically this fixes the following stacktool warnings:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame 
pointer save/setup
  stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without 
frame pointer save/setup
  stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without 
frame pointer save/setup
  stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call 
without frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without 
frame pointer save/setup
  stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame 
pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call 
without frame pointer save/setup

So it only adds a stack frame to 15 call sites out of ~5000 calls to
___preempt_schedule[_notrace]().  All the others already had stack frames.

Oddly, this change actually seems to make things faster in a lot of
cases.  For many smaller functions it causes the stack frame creation to
get moved out of the common path and into the unlikely path.

For example, here's the original cyc2ns_read_end():

  8101f8c0 :
  8101f8c0: 55  push   %rbp
  8101f8c1: 48 89 e5mov%rsp,%rbp
  8101f8c4: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c8: 75 08   jne8101f8d2 

  8101f8ca: 65 48 89 3d e6 5a ffmov
%rdi,%gs:0x7eff5ae6(%rip)# 153b8 
  8101f8d1: 7e
  8101f8d2: 65 ff 0d 77 c4 fe 7edecl   %gs:0x7efec477(%rip) 
   # bd50 <__preempt_count>
  8101f8d9: 74 02   je 8101f8dd 

  8101f8db: 5d  pop%rbp
  8101f8dc: c3  retq
  8101f8dd: e8 1e 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e2: 5d  pop%rbp
  8101f8e3: c3  retq
  8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8eb: 00 00 00 00 00

And here's the same function with the patch:

  8101f8c0 :
  8101f8c0: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c4: 75 08   jne8101f8ce 

  8101f8c6: 65 48 89 3d ea 5a ffmov
%rdi,%gs:0x7eff5aea(%rip)# 153b8 
  8101f8cd: 7e
  8101f8ce: 65 ff 0d 7b c4 fe 7edecl   %gs:0x7efec47b(%rip) 
   # bd50 <__preempt_count>
  8101f8d5: 74 01   je 8101f8d8 

  8101f8d7: c3  retq
  8101f8d8: 55  push   %rbp
  8101f8d9: 48 89 e5mov%rsp,%rbp
  8101f8dc: e8 1f 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e1: 5d  pop%rbp
  8101f8e2: c3  retq
  8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8ea: 84 00 00 00 00 00

Notice that it moved the frame pointer setup code to the unlikely
___preempt_schedule() call