This series lets call_rcu() callbacks be reclaimed as soon as either a
normal or an expedited grace period that covers them has elapsed, rather
than always waiting for a normal grace period.

Motivation
==========
Today there is an asymmetry: synchronize_rcu_expedited() callers get fast
reclaim, but call_rcu() callers never benefit from those same expedited
grace periods, even though an expedited GP proves exactly the same thing
as a normal one -- all pre-existing readers are done.  When expedited GPs
are running on the system (driven by other subsystems), call_rcu()
callbacks that could already be freed instead sit in RCU_WAIT_TAIL until
the next normal GP.  This series treats a grace period as a grace period
regardless of how it was driven, so memory is reclaimed sooner.

Design
======
Callback segments now record both the normal and expedited grace-period
sequence in struct rcu_gp_seq, and rcu_segcblist_advance() releases a
segment as soon as poll_state_synchronize_rcu_full() reports that either
has completed.  Three notification paths are taught about expedited
completion so the advance actually happens: the NOCB rcuog kthreads,
the rcu_pending() tick gate, and rcu_core().

Changelog:
RFC: https://lore.kernel.org/all/[email protected]/
Changes in v1:
 - New prep patch 1 renames struct rcu_gp_oldstate to struct rcu_gp_seq
   and its fields rgos_norm/rgos_exp to norm/exp tree-wide (Frederic).
 - The rcu_segcblist segment field stays named gp_seq; only its type
   changes (Frederic).
 - Patch 8 (NOCB wake) is reworked.  v1 woke the wrong waitqueue
   (rdp_gp->nocb_gp_wq via wake_nocb_gp() rather than the leaf
   rnp->nocb_gp_wq[] that an rcuog kthread waiting for a GP sleeps on),
   and the wait condition only checked the normal ->gp_seq.  The rcuog
   grace-period wait now tracks a struct rcu_gp_seq and is released via
   poll_state_synchronize_rcu_full(); rcu_exp_wait_wake() wakes the leaf
   node through the new rcu_nocb_exp_cleanup() (Frederic).
 - rcu_pending() uses a new memory-ordering-free
   poll_state_synchronize_rcu_full_unordered() to avoid memory barriers
   on every tick, leaving the ordering duty to rcu_core() (Frederic).

Still open: Frederic asked whether the first smp_mb() in
poll_state_synchronize_rcu_full() is needed on the callback-advance path
(patch 6).  That path still uses the fully ordered helper; only
rcu_pending() was switched to the unordered variant.  Happy to revisit.

Puranjay Mohan (11):
  rcu: Rename struct rcu_gp_oldstate to rcu_gp_seq
  rcu/segcblist: Add SRCU and Tasks RCU wrapper functions
  rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper
  rcu/segcblist: Track segment grace periods with struct rcu_gp_seq
  rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited
    GPs
  rcu: Enable RCU callbacks to benefit from expedited grace periods
  rcu: Update comments for gp_seq and expedited GP tracking
  rcu: Wake NOCB rcuog kthreads on expedited grace period completion
  rcu: Detect expedited grace period completion in rcu_pending()
  rcu: Advance callbacks for expedited GP completion in rcu_core()
  rcuscale: Add concurrent expedited GP threads for callback scaling
    tests

 include/linux/rcu_segcblist.h |  16 ++--
 include/linux/rcupdate.h      |  13 ++-
 include/linux/rcupdate_wait.h |   2 +-
 include/linux/rcutiny.h       |  36 ++++-----
 include/linux/rcutree.h       |  29 +++----
 include/trace/events/rcu.h    |   5 +-
 kernel/rcu/rcu.h              |  13 ++-
 kernel/rcu/rcu_segcblist.c    | 139 ++++++++++++++++++++++----------
 kernel/rcu/rcu_segcblist.h    |   8 +-
 kernel/rcu/rcuscale.c         |  84 ++++++++++++++++++-
 kernel/rcu/rcutorture.c       |  30 +++----
 kernel/rcu/srcutree.c         |  14 ++--
 kernel/rcu/tasks.h            |   8 +-
 kernel/rcu/tiny.c             |   4 +-
 kernel/rcu/tree.c             | 147 ++++++++++++++++++++++------------
 kernel/rcu/tree.h             |   3 +-
 kernel/rcu/tree_exp.h         |  20 ++---
 kernel/rcu/tree_nocb.h        | 131 ++++++++++++++++++++++++------
 mm/slab_common.c              |   6 +-
 19 files changed, 496 insertions(+), 212 deletions(-)


base-commit: 709d17a22bfac78765f6cbaec42e15bcd4aa4f08
-- 
2.53.0-Meta


Reply via email to