[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()
Commit-ID: 821eae7d14f0bbf69df1cc4656c54900b2672928 Gitweb: http://git.kernel.org/tip/821eae7d14f0bbf69df1cc4656c54900b2672928 Author: Josh PoimboeufAuthorDate: Thu, 18 Feb 2016 11:41:58 -0600 Committer: Ingo Molnar CommitDate: Wed, 24 Feb 2016 08:35:45 +0100 sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]() If __preempt_schedule() or __preempt_schedule_notrace() is referenced at the beginning of a function, gcc can insert the asm inline "call ___preempt_schedule[_notrace]" instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in bad stack traces. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the inline asm statements. Specifically this fixes the following stacktool warnings: stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame pointer save/setup stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without frame pointer save/setup stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without frame pointer save/setup stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call without frame pointer save/setup So it only adds a stack frame to 15 call sites out of ~5000 calls to ___preempt_schedule[_notrace](). All the others already had stack frames. Oddly, this change actually seems to make things faster in a lot of cases. For many smaller functions it causes the stack frame creation to get moved out of the common path and into the unlikely path. For example, here's the original cyc2ns_read_end(): 8101f8c0 : 8101f8c0: 55 push %rbp 8101f8c1: 48 89 e5mov%rsp,%rbp 8101f8c4: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c8: 75 08 jne8101f8d2 8101f8ca: 65 48 89 3d e6 5a ffmov %rdi,%gs:0x7eff5ae6(%rip)# 153b8 8101f8d1: 7e 8101f8d2: 65 ff 0d 77 c4 fe 7edecl %gs:0x7efec477(%rip) # bd50 <__preempt_count> 8101f8d9: 74 02 je 8101f8dd 8101f8db: 5d pop%rbp 8101f8dc: c3 retq 8101f8dd: e8 1e 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e2: 5d pop%rbp 8101f8e3: c3 retq 8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw %cs:0x0(%rax,%rax,1) 8101f8eb: 00 00 00 00 00 And here's the same function with the patch: 8101f8c0 : 8101f8c0: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c4: 75 08 jne8101f8ce 8101f8c6: 65 48 89 3d ea 5a ffmov %rdi,%gs:0x7eff5aea(%rip)# 153b8 8101f8cd: 7e 8101f8ce: 65 ff 0d 7b c4 fe 7edecl %gs:0x7efec47b(%rip) # bd50 <__preempt_count> 8101f8d5: 74 01 je 8101f8d8 8101f8d7: c3 retq 8101f8d8: 55 push %rbp 8101f8d9: 48 89 e5mov%rsp,%rbp 8101f8dc: e8 1f 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e1: 5d pop%rbp 8101f8e2: c3 retq 8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw
[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()
Commit-ID: 821eae7d14f0bbf69df1cc4656c54900b2672928 Gitweb: http://git.kernel.org/tip/821eae7d14f0bbf69df1cc4656c54900b2672928 Author: Josh Poimboeuf AuthorDate: Thu, 18 Feb 2016 11:41:58 -0600 Committer: Ingo Molnar CommitDate: Wed, 24 Feb 2016 08:35:45 +0100 sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]() If __preempt_schedule() or __preempt_schedule_notrace() is referenced at the beginning of a function, gcc can insert the asm inline "call ___preempt_schedule[_notrace]" instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in bad stack traces. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the inline asm statements. Specifically this fixes the following stacktool warnings: stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame pointer save/setup stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without frame pointer save/setup stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without frame pointer save/setup stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call without frame pointer save/setup So it only adds a stack frame to 15 call sites out of ~5000 calls to ___preempt_schedule[_notrace](). All the others already had stack frames. Oddly, this change actually seems to make things faster in a lot of cases. For many smaller functions it causes the stack frame creation to get moved out of the common path and into the unlikely path. For example, here's the original cyc2ns_read_end(): 8101f8c0 : 8101f8c0: 55 push %rbp 8101f8c1: 48 89 e5mov%rsp,%rbp 8101f8c4: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c8: 75 08 jne8101f8d2 8101f8ca: 65 48 89 3d e6 5a ffmov %rdi,%gs:0x7eff5ae6(%rip)# 153b8 8101f8d1: 7e 8101f8d2: 65 ff 0d 77 c4 fe 7edecl %gs:0x7efec477(%rip) # bd50 <__preempt_count> 8101f8d9: 74 02 je 8101f8dd 8101f8db: 5d pop%rbp 8101f8dc: c3 retq 8101f8dd: e8 1e 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e2: 5d pop%rbp 8101f8e3: c3 retq 8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw %cs:0x0(%rax,%rax,1) 8101f8eb: 00 00 00 00 00 And here's the same function with the patch: 8101f8c0 : 8101f8c0: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c4: 75 08 jne8101f8ce 8101f8c6: 65 48 89 3d ea 5a ffmov %rdi,%gs:0x7eff5aea(%rip)# 153b8 8101f8cd: 7e 8101f8ce: 65 ff 0d 7b c4 fe 7edecl %gs:0x7efec47b(%rip) # bd50 <__preempt_count> 8101f8d5: 74 01 je 8101f8d8 8101f8d7: c3 retq 8101f8d8: 55 push %rbp 8101f8d9: 48 89 e5mov%rsp,%rbp 8101f8dc: e8 1f 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e1: 5d pop%rbp 8101f8e2: c3 retq 8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw %cs:0x0(%rax,%rax,1) 8101f8ea: 84 00 00 00 00 00 Notice that it moved the frame pointer setup code to the unlikely ___preempt_schedule() call
[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()
Commit-ID: b5429dac54a31359e508add8572ebe8d29b8cbdb Gitweb: http://git.kernel.org/tip/b5429dac54a31359e508add8572ebe8d29b8cbdb Author: Josh PoimboeufAuthorDate: Thu, 18 Feb 2016 11:41:58 -0600 Committer: Ingo Molnar CommitDate: Tue, 23 Feb 2016 09:04:01 +0100 sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]() If __preempt_schedule() or __preempt_schedule_notrace() is referenced at the beginning of a function, gcc can insert the asm inline "call ___preempt_schedule[_notrace]" instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in bad stack traces. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the inline asm statements. Specifically this fixes the following stacktool warnings: stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame pointer save/setup stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without frame pointer save/setup stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without frame pointer save/setup stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call without frame pointer save/setup So it only adds a stack frame to 15 call sites out of ~5000 calls to ___preempt_schedule[_notrace](). All the others already had stack frames. Oddly, this change actually seems to make things faster in a lot of cases. For many smaller functions it causes the stack frame creation to get moved out of the common path and into the unlikely path. For example, here's the original cyc2ns_read_end(): 8101f8c0 : 8101f8c0: 55 push %rbp 8101f8c1: 48 89 e5mov%rsp,%rbp 8101f8c4: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c8: 75 08 jne8101f8d2 8101f8ca: 65 48 89 3d e6 5a ffmov %rdi,%gs:0x7eff5ae6(%rip)# 153b8 8101f8d1: 7e 8101f8d2: 65 ff 0d 77 c4 fe 7edecl %gs:0x7efec477(%rip) # bd50 <__preempt_count> 8101f8d9: 74 02 je 8101f8dd 8101f8db: 5d pop%rbp 8101f8dc: c3 retq 8101f8dd: e8 1e 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e2: 5d pop%rbp 8101f8e3: c3 retq 8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw %cs:0x0(%rax,%rax,1) 8101f8eb: 00 00 00 00 00 And here's the same function with the patch: 8101f8c0 : 8101f8c0: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c4: 75 08 jne8101f8ce 8101f8c6: 65 48 89 3d ea 5a ffmov %rdi,%gs:0x7eff5aea(%rip)# 153b8 8101f8cd: 7e 8101f8ce: 65 ff 0d 7b c4 fe 7edecl %gs:0x7efec47b(%rip) # bd50 <__preempt_count> 8101f8d5: 74 01 je 8101f8d8 8101f8d7: c3 retq 8101f8d8: 55 push %rbp 8101f8d9: 48 89 e5mov%rsp,%rbp 8101f8dc: e8 1f 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e1: 5d pop%rbp 8101f8e2: c3 retq 8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw
[tip:x86/debug] sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()
Commit-ID: b5429dac54a31359e508add8572ebe8d29b8cbdb Gitweb: http://git.kernel.org/tip/b5429dac54a31359e508add8572ebe8d29b8cbdb Author: Josh Poimboeuf AuthorDate: Thu, 18 Feb 2016 11:41:58 -0600 Committer: Ingo Molnar CommitDate: Tue, 23 Feb 2016 09:04:01 +0100 sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]() If __preempt_schedule() or __preempt_schedule_notrace() is referenced at the beginning of a function, gcc can insert the asm inline "call ___preempt_schedule[_notrace]" instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in bad stack traces. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the inline asm statements. Specifically this fixes the following stacktool warnings: stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame pointer save/setup stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without frame pointer save/setup stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without frame pointer save/setup stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call without frame pointer save/setup So it only adds a stack frame to 15 call sites out of ~5000 calls to ___preempt_schedule[_notrace](). All the others already had stack frames. Oddly, this change actually seems to make things faster in a lot of cases. For many smaller functions it causes the stack frame creation to get moved out of the common path and into the unlikely path. For example, here's the original cyc2ns_read_end(): 8101f8c0 : 8101f8c0: 55 push %rbp 8101f8c1: 48 89 e5mov%rsp,%rbp 8101f8c4: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c8: 75 08 jne8101f8d2 8101f8ca: 65 48 89 3d e6 5a ffmov %rdi,%gs:0x7eff5ae6(%rip)# 153b8 8101f8d1: 7e 8101f8d2: 65 ff 0d 77 c4 fe 7edecl %gs:0x7efec477(%rip) # bd50 <__preempt_count> 8101f8d9: 74 02 je 8101f8dd 8101f8db: 5d pop%rbp 8101f8dc: c3 retq 8101f8dd: e8 1e 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e2: 5d pop%rbp 8101f8e3: c3 retq 8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw %cs:0x0(%rax,%rax,1) 8101f8eb: 00 00 00 00 00 And here's the same function with the patch: 8101f8c0 : 8101f8c0: 83 6f 10 01 subl $0x1,0x10(%rdi) 8101f8c4: 75 08 jne8101f8ce 8101f8c6: 65 48 89 3d ea 5a ffmov %rdi,%gs:0x7eff5aea(%rip)# 153b8 8101f8cd: 7e 8101f8ce: 65 ff 0d 7b c4 fe 7edecl %gs:0x7efec47b(%rip) # bd50 <__preempt_count> 8101f8d5: 74 01 je 8101f8d8 8101f8d7: c3 retq 8101f8d8: 55 push %rbp 8101f8d9: 48 89 e5mov%rsp,%rbp 8101f8dc: e8 1f 37 fe ff callq 81003000 <___preempt_schedule> 8101f8e1: 5d pop%rbp 8101f8e2: c3 retq 8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw %cs:0x0(%rax,%rax,1) 8101f8ea: 84 00 00 00 00 00 Notice that it moved the frame pointer setup code to the unlikely ___preempt_schedule() call