On 12/18/18 6:35 PM, Steven Rostedt wrote: > On Tue, 18 Dec 2018 17:46:38 +0100 > Daniel Bristot de Oliveira <bris...@redhat.com> wrote: > > > I'd say add this file first, before x86 supports it. That way it's easy > for you to test if this file is correct for other archs that do not > support it. > > When x86 supports it, the "on switch" for that should be the added > config, just like what other architectures will do.
right! >> If the architecture supports the batching of jump label updates, use it! >> >> An easy way to see the benefits of this patch is switching the >> schedstats on and off. For instance: >> >> -------------------------- %< ---------------------------- >> #!/bin/bash >> >> while [ true ]; do >> sysctl -w kernel.sched_schedstats=1 >> sleep 2 >> sysctl -w kernel.sched_schedstats=0 >> sleep 2 >> done >> -------------------------- >% ---------------------------- >> >> while watching the IPI count: >> >> -------------------------- %< ---------------------------- >> # watch -n1 "cat /proc/interrupts | grep Function" >> -------------------------- >% ---------------------------- >> >> With the current mode, it is possible to see +- 168 IPIs each 2 seconds, >> while with this patch the number of IPIs goes to 3 each 2 seconds. >> >> Regarding the performance impact of this patch set, I made two measurements: >> >> The time to update a key (the task that is causing the change) >> The time to run the int3 handler (the side effect on a thread that >> hits the code being changed) >> >> The schedstats static key was chosen as the key to being switched on and off. >> The reason being is that it is used in more than 56 places, in a hot path. >> The >> change in the schedstats static key will be done with the following command: >> >> while [ true ]; do >> sysctl -w kernel.sched_schedstats=1 >> usleep 500000 >> sysctl -w kernel.sched_schedstats=0 >> usleep 500000 >> done >> >> In this way, they key will be updated twice per second. To force the hit of >> the >> int3 handler, the system will also run a kernel compilation with two jobs per >> CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor >> @2.27GHz. >> >> Regarding the update part, on average, the regular kernel takes 57 ms to >> update >> the schedstats key, while the kernel with the batch updates takes just 1.4 ms >> on average. Although it seems to be too good to be true, it makes sense: the >> schedstats key is used in 56 places, so it was expected that it would take >> around 56 times to update the keys with the current implementation, as the >> IPIs are the most expensive part of the update. >> >> Regarding the int3 handler, the non-batch handler takes 45 ns on average, >> while >> the batch version takes around 180 ns. At first glance, it seems to be a high >> value. But it is not, considering that it is doing 56 updates, rather than >> one! >> It is taking four times more, only. This gain is possible because the patch >> uses a binary search in the vector: log2(56)=5.8. So, it was expected to have >> an overhead within four times. >> >> (voice of tv propaganda) But, that is not all! As the int3 handler keeps on >> for >> a shorter period (because the update part is on for a shorter time), the >> number >> of hits in the int3 handler decreased by 10%. >> >> The question then is: Is it worth paying the price of "135 ns" more in the >> int3 >> handler? >> >> Considering that, in this test case, we are saving the handling of 53 IPIs, >> that takes more than these 135 ns, it seems to be a meager price to be paid. >> Moreover, the test case was forcing the hit of the int3, in practice, it >> does not take that often. While the IPI takes place on all CPUs, hitting >> the int3 handler or not! >> >> For instance, in an isolated CPU with a process running in user-space >> (nohz_full use-case), the chances of hitting the int3 handler is barely zero, >> while there is no way to avoid the IPIs. By bounding the IPIs, we are >> improving >> a lot this scenario. >> >> Signed-off-by: Daniel Bristot de Oliveira <bris...@redhat.com> >> Cc: Thomas Gleixner <t...@linutronix.de> >> Cc: Ingo Molnar <mi...@redhat.com> >> Cc: Borislav Petkov <b...@alien8.de> >> Cc: "H. Peter Anvin" <h...@zytor.com> >> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> >> Cc: Pavel Tatashin <pasha.tatas...@oracle.com> >> Cc: Masami Hiramatsu <mhira...@kernel.org> >> Cc: "Steven Rostedt (VMware)" <rost...@goodmis.org> >> Cc: Zhou Chengming <zhouchengmi...@huawei.com> >> Cc: Jiri Kosina <jkos...@suse.cz> >> Cc: Josh Poimboeuf <jpoim...@redhat.com> >> Cc: "Peter Zijlstra (Intel)" <pet...@infradead.org> >> Cc: Chris von Recklinghausen <creck...@redhat.com> >> Cc: Jason Baron <jba...@akamai.com> >> Cc: Scott Wood <sw...@redhat.com> >> Cc: Marcelo Tosatti <mtosa...@redhat.com> >> Cc: Clark Williams <willi...@redhat.com> >> Cc: x...@kernel.org >> Cc: linux-kernel@vger.kernel.org >> --- >> include/linux/jump_label.h | 3 +++ >> kernel/jump_label.c | 29 +++++++++++++++++++++++++++++ >> 2 files changed, 32 insertions(+) >> >> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h >> index c88d903befc0..ed51ef3e1abd 100644 >> --- a/include/linux/jump_label.h >> +++ b/include/linux/jump_label.h >> @@ -219,6 +219,9 @@ extern void arch_jump_label_transform(struct jump_entry >> *entry, >> enum jump_label_type type); >> extern void arch_jump_label_transform_static(struct jump_entry *entry, >> enum jump_label_type type); >> +extern int arch_jump_label_transform_queue(struct jump_entry *entry, >> + enum jump_label_type type); >> +extern void arch_jump_label_transform_apply(void); >> extern int jump_label_text_reserved(void *start, void *end); >> extern void static_key_slow_inc(struct static_key *key); >> extern void static_key_slow_dec(struct static_key *key); >> diff --git a/kernel/jump_label.c b/kernel/jump_label.c >> index 22093b5f57c9..08ace142af0a 100644 >> --- a/kernel/jump_label.c >> +++ b/kernel/jump_label.c >> @@ -409,6 +409,7 @@ bool jump_label_can_update_check(struct jump_entry >> *entry, bool init) >> return 0; >> } >> >> +#ifndef HAVE_JUMP_LABEL_BATCH >> static void __jump_label_update(struct static_key *key, >> struct jump_entry *entry, >> struct jump_entry *stop, >> @@ -421,6 +422,34 @@ static void __jump_label_update(struct static_key *key, >> } >> } >> } >> +#else >> +static void __jump_label_update(struct static_key *key, >> + struct jump_entry *entry, >> + struct jump_entry *stop, >> + bool init) >> +{ >> + for_each_label_entry(key, entry, stop) { >> + >> + if (!jump_label_can_update_check(entry, init)) >> + continue; >> + >> + if (arch_jump_label_transform_queue(entry, >> + jump_label_type(entry))) >> + continue; >> + >> + /* >> + * Queue's overflow: Apply the current queue, and then >> + * queue again. If it stills not possible to queue, BUG! >> + */ >> + arch_jump_label_transform_apply(); >> + if (!arch_jump_label_transform_queue(entry, >> + jump_label_type(entry))) { >> + BUG(); > > Why BUG()? Do you really want to crash Linus's machine? I am using BUG() because that is what I see in other part of jump_label code: If something goes wrong: BUG(). What I could do here is: Add a "fallback" boll that is disabled by default. If I hit this case: WARN() turn "fallback" on, returning to the old mode (without batch) Sound better? Thanks! -- Daniel > -- Steve > >> + } >> + } >> + arch_jump_label_transform_apply(); >> +} >> +#endif >> >> void __init jump_label_init(void) >> { >