On 2/8/26 10:41 PM, Chen Ridong wrote:

On 2026/2/7 4:37, Waiman Long wrote:
Clarify the locking rules associated with file level internal variables
inside the cpuset code. There is no functional change.

Signed-off-by: Waiman Long <[email protected]>
---
  kernel/cgroup/cpuset.c | 105 ++++++++++++++++++++++++-----------------
  1 file changed, 61 insertions(+), 44 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c43efef7df71..a4c6386a594d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -61,6 +61,58 @@ static const char * const perr_strings[] = {
        [PERR_REMOTE]    = "Have remote partition underneath",
  };
+/*
+ * CPUSET Locking Convention
+ * -------------------------
+ *
+ * Below are the three global locks guarding cpuset structures in lock
+ * acquisition order:
+ *  - cpu_hotplug_lock (cpus_read_lock/cpus_write_lock)
+ *  - cpuset_mutex
+ *  - callback_lock (raw spinlock)
+ *
+ * A task must hold all the three locks to modify externally visible or
+ * used fields of cpusets, though some of the internally used cpuset fields
+ * and internal variables can be modified without holding callback_lock. If 
only
+ * reliable read access of the externally used fields are needed, a task can
+ * hold either cpuset_mutex or callback_lock which are exposed to other
+ * external subsystems.
+ *
+ * If a task holds cpu_hotplug_lock and cpuset_mutex, it blocks others,
+ * ensuring that it is the only task able to also acquire callback_lock and
+ * be able to modify cpusets.  It can perform various checks on the cpuset
+ * structure first, knowing nothing will change. It can also allocate memory
+ * without holding callback_lock. While it is performing these checks, various
+ * callback routines can briefly acquire callback_lock to query cpusets.  Once
+ * it is ready to make the changes, it takes callback_lock, blocking everyone
+ * else.
+ *
+ * Calls to the kernel memory allocator cannot be made while holding
+ * callback_lock which is a spinlock, as the memory allocator may sleep or
+ * call back into cpuset code and acquire callback_lock.
+ *
+ * Now, the task_struct fields mems_allowed and mempolicy may be changed
+ * by other task, we use alloc_lock in the task_struct fields to protect
+ * them.
+ *
+ * The cpuset_common_seq_show() handlers only hold callback_lock across
+ * small pieces of code, such as when reading out possibly multi-word
+ * cpumasks and nodemasks.
+ */
+
+static DEFINE_MUTEX(cpuset_mutex);
+
+/*
+ * File level internal variables below follow one of the following exclusion
+ * rules.
+ *
+ * RWCS: Read/write-able by holding either cpus_write_lock or both
+ *       cpus_read_lock and cpuset_mutex.
+ *
Does this mean that variables can be read or written only by holding
cpus_write_lock?

I believe that to write cpuset variables, we must hold either (cpus_write_lock
and cpuset_mutex) or (cpus_read_lock and cpuset_mutex).

The importance of the locking rule is to emphasize the condition for mutual exclusion. Once cpus_write_lock is held, no other task can hold cpus_read_lock and cpuset_mutex. I will consider holding cpuset_mutex as optional, though almost all the cpuset internal variables are accessed from the CPU hotplug side with both cpus_write_lock and cpuset_mutex held. The only exception is force_sd_rebuild (sd_rebuild) that can be set directly from the scheduling code without holding cpuset_mtuex. I can change it to "holding cpus_write_lock (and optionally cpuset_mutex) or both cpus_read_lock and cpuset_mutex" if that makes it clearer.

Cheers,
Longman


Reply via email to