On Sun, May 21, 2017 at 11:51:39AM +0300, Konstantin Khlebnikov wrote: > On 20.05.2017 19:42, Paul E. McKenney wrote: > >On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > >>This allows to get rid of unneeded invocations. > >> > >>Function debug_lockdep_rcu_enabled() becomes really hot if several > >>debug options are enabled together with CONFIG_PROVE_RCU. > >> > >>Hottest path ends with: > >> debug_lockdep_rcu_enabled > >> is_ftrace_trampoline > >> __kernel_text_address > >> > >>Here debug_lockdep_rcu_enabled() is called from condition > >>(debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > >>do_for_each_ftrace_op(), where "c" is false. > >> > >>With this patch "netperf -H localhost" shows boost from 2400 to 2500. > >> > >>Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> > > > >Nice performance increase! > > > >The gcc documentation says that __attribute__((pure)) functions are > >supposed to have return values that depend only at the function's > >arguments and the values of global variables. However, it also says: > > > > Interesting non-pure functions are functions with infinite loops > > or those depending on volatile memory or other system resource, > > that may change between two consecutive calls (such as feof in > > a multithreading environment). > > > >This is OK for current->lockdep_recursion because this variable is changed > >only by the current task (I think so, anyway). > > > >It is sort of OK for debug_locks. This could be set to zero at any time > >by any other task, but if we have a race condition that very rarely causes > >two lockdep splats instead of just one, so what? (But I am sure that > >some of the people on CC will correct me if I am wrong here.) > > > >It should be OK for rcu_scheduler_active because the transition from > >RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first > >context switch, and the various barrier() calls, implied as well as > >explicit, should keep things straight. > > > >But I don't totally trust my analysis. Could you please get someone > >more gcc-savvy to review this and give their ack/review? Given that, > >I will queue it. > > > > Thanx, Paul > > Thank you for analisys. > > __attribute__((pure)) allows compiler to deduplicate / eliminate calls. > This indeed might expand race windows when global switches like debug_locks > changes their state. But strict synchronization here isn't required.
Any objections to this patch? Thanx, Paul > >>--- > >> include/linux/rcupdate.h | 2 +- > >> kernel/rcu/update.c | 2 +- > >> 2 files changed, 2 insertions(+), 2 deletions(-) > >> > >>diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > >>index e1e5d002fdb9..9ecb3cb715bd 100644 > >>--- a/include/linux/rcupdate.h > >>+++ b/include/linux/rcupdate.h > >>@@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > >> extern struct lockdep_map rcu_bh_lock_map; > >> extern struct lockdep_map rcu_sched_lock_map; > >> extern struct lockdep_map rcu_callback_map; > >>-int debug_lockdep_rcu_enabled(void); > >>+int __pure debug_lockdep_rcu_enabled(void); > >> > >> int rcu_read_lock_held(void); > >> int rcu_read_lock_bh_held(void); > >>diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > >>index 273e869ca21d..a0c30abefdcd 100644 > >>--- a/kernel/rcu/update.c > >>+++ b/kernel/rcu/update.c > >>@@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > >> STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); > >> EXPORT_SYMBOL_GPL(rcu_callback_map); > >> > >>-int notrace debug_lockdep_rcu_enabled(void) > >>+int __pure notrace debug_lockdep_rcu_enabled(void) > >> { > >> return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > >> current->lockdep_recursion == 0; > >> > > >