This series lets call_rcu() callbacks be reclaimed as soon as either a normal or an expedited grace period that covers them has elapsed, rather than always waiting for a normal grace period.
Motivation ========== Today there is an asymmetry: synchronize_rcu_expedited() callers get fast reclaim, but call_rcu() callers never benefit from those same expedited grace periods, even though an expedited GP proves exactly the same thing as a normal one -- all pre-existing readers are done. When expedited GPs are running on the system (driven by other subsystems), call_rcu() callbacks that could already be freed instead sit in RCU_WAIT_TAIL until the next normal GP. This series treats a grace period as a grace period regardless of how it was driven, so memory is reclaimed sooner. Design ====== Callback segments now record both the normal and expedited grace-period sequence in struct rcu_gp_seq, and rcu_segcblist_advance() releases a segment as soon as poll_state_synchronize_rcu_full() reports that either has completed. Three notification paths are taught about expedited completion so the advance actually happens: the NOCB rcuog kthreads, the rcu_pending() tick gate, and rcu_core(). Changelog: RFC: https://lore.kernel.org/all/[email protected]/ Changes in v1: - New prep patch 1 renames struct rcu_gp_oldstate to struct rcu_gp_seq and its fields rgos_norm/rgos_exp to norm/exp tree-wide (Frederic). - The rcu_segcblist segment field stays named gp_seq; only its type changes (Frederic). - Patch 8 (NOCB wake) is reworked. v1 woke the wrong waitqueue (rdp_gp->nocb_gp_wq via wake_nocb_gp() rather than the leaf rnp->nocb_gp_wq[] that an rcuog kthread waiting for a GP sleeps on), and the wait condition only checked the normal ->gp_seq. The rcuog grace-period wait now tracks a struct rcu_gp_seq and is released via poll_state_synchronize_rcu_full(); rcu_exp_wait_wake() wakes the leaf node through the new rcu_nocb_exp_cleanup() (Frederic). - rcu_pending() uses a new memory-ordering-free poll_state_synchronize_rcu_full_unordered() to avoid memory barriers on every tick, leaving the ordering duty to rcu_core() (Frederic). Still open: Frederic asked whether the first smp_mb() in poll_state_synchronize_rcu_full() is needed on the callback-advance path (patch 6). That path still uses the fully ordered helper; only rcu_pending() was switched to the unordered variant. Happy to revisit. Puranjay Mohan (11): rcu: Rename struct rcu_gp_oldstate to rcu_gp_seq rcu/segcblist: Add SRCU and Tasks RCU wrapper functions rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper rcu/segcblist: Track segment grace periods with struct rcu_gp_seq rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited GPs rcu: Enable RCU callbacks to benefit from expedited grace periods rcu: Update comments for gp_seq and expedited GP tracking rcu: Wake NOCB rcuog kthreads on expedited grace period completion rcu: Detect expedited grace period completion in rcu_pending() rcu: Advance callbacks for expedited GP completion in rcu_core() rcuscale: Add concurrent expedited GP threads for callback scaling tests include/linux/rcu_segcblist.h | 16 ++-- include/linux/rcupdate.h | 13 ++- include/linux/rcupdate_wait.h | 2 +- include/linux/rcutiny.h | 36 ++++----- include/linux/rcutree.h | 29 +++---- include/trace/events/rcu.h | 5 +- kernel/rcu/rcu.h | 13 ++- kernel/rcu/rcu_segcblist.c | 139 ++++++++++++++++++++++---------- kernel/rcu/rcu_segcblist.h | 8 +- kernel/rcu/rcuscale.c | 84 ++++++++++++++++++- kernel/rcu/rcutorture.c | 30 +++---- kernel/rcu/srcutree.c | 14 ++-- kernel/rcu/tasks.h | 8 +- kernel/rcu/tiny.c | 4 +- kernel/rcu/tree.c | 147 ++++++++++++++++++++++------------ kernel/rcu/tree.h | 3 +- kernel/rcu/tree_exp.h | 20 ++--- kernel/rcu/tree_nocb.h | 131 ++++++++++++++++++++++++------ mm/slab_common.c | 6 +- 19 files changed, 496 insertions(+), 212 deletions(-) base-commit: 709d17a22bfac78765f6cbaec42e15bcd4aa4f08 -- 2.53.0-Meta
