cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue

Waiman Long Fri, 30 Jan 2026 17:46:37 -0800

On 1/30/26 7:58 PM, Chen Ridong wrote:


On 2026/1/30 23:42, Waiman Long wrote:

The update_isolation_cpumasks() function can be called either directly
from regular cpuset control file write with cpuset_full_lock() called
or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.

As we are going to enable dynamic update to the nozh_full housekeeping
cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
allowing the CPU hotplug path to call into housekeeping_update() directly
from update_isolation_cpumasks() will likely cause deadlock. So we
have to defer any call to housekeeping_update() after the CPU hotplug
operation has finished. This is now done via the workqueue where
the actual housekeeping_update() call, if needed, will happen after
cpus_write_lock is released.

We can't use the synchronous task_work API as call from CPU hotplug
path happen in the per-cpu kthread of the CPU that is being shut down
or brought up. Because of the asynchronous nature of workqueue, the
HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
"cpuset.cpus.isolated" control file in this case.

Also add a check in test_cpuset_prs.sh and modify some existing
test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
housekeeping cpumask will both be updated.

Signed-off-by: Waiman Long <[email protected]>
---
  kernel/cgroup/cpuset.c                        | 37 +++++++++++++++++--
  .../selftests/cgroup/test_cpuset_prs.sh       | 13 +++++--
  2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7b7d12ab1006..0b0eb1df09d5 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -84,6 +84,9 @@ static cpumask_var_t  isolated_cpus;
   */
  static bool isolated_cpus_updating;

+/* Both cpuset_mutex and cpus_read_locked acquired */

+static bool cpuset_locked;
+
  /*
   * A flag to force sched domain rebuild at the end of an operation.
   * It can be set in
@@ -285,10 +288,12 @@ void cpuset_full_lock(void)
  {
        cpus_read_lock();
        mutex_lock(&cpuset_mutex);
+       cpuset_locked = true;
  }

void cpuset_full_unlock(void)

  {
+       cpuset_locked = false;
        mutex_unlock(&cpuset_mutex);
        cpus_read_unlock();
  }
@@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, 
struct cpumask *new_cpus)
        return false;
  }

+static void isolcpus_workfn(struct work_struct *work)

+{
+       cpuset_full_lock();
+       if (isolated_cpus_updating) {
+               WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
+               isolated_cpus_updating = false;
+       }
+       cpuset_full_unlock();
+}
+
  /*
   * update_isolation_cpumasks - Update external isolation related CPU masks
   *
@@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, 
struct cpumask *new_cpus)
   */
  static void update_isolation_cpumasks(void)
  {
-       int ret;
+       static DECLARE_WORK(isolcpus_work, isolcpus_workfn);

if (!isolated_cpus_updating)

                return;

- ret = housekeeping_update(isolated_cpus);

-       WARN_ON_ONCE(ret < 0);
+       /*
+        * This function can be reached either directly from regular cpuset
+        * control file write (cpuset_locked) or via hotplug (cpus_write_lock
+        * && cpuset_mutex held). In the later case, we defer the
+        * housekeeping_update() call to the system_unbound_wq to avoid the
+        * possibility of deadlock. This also means that there will be a short
+        * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
+        * behind isolated_cpus.
+        */
+       if (!cpuset_locked) {

Adding a global variable makes this difficult to handle, especially in
concurrent scenarios, since we could read it outside of a critical region.

No, cpuset_locked is always read from or written into inside a criticalsection. It is under cpuset_mutex up to this point and then with thecpuset_top_mutex with the next patch.


I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
instead, which can indicate to the caller it should call without holding the
full lock.

The point of this global variable is to distinguish between calling fromCPU hotplug and the other regular cpuset code paths. The only differencebetween these two are having cpus_read_lock or cpus_write_lock held.That is why I think adding a global variable in cpuset_full_lock() isthe easy way. Otherwise, we will to add extra argument to some of thefunctions to distinguish these two cases.


Cheers,
Longman

Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer housekeeping_update() call from CPU hotplug to workqueue

Reply via email to