On 2/12/26 10:28 PM, Chen Ridong wrote:

On 2026/2/13 0:46, Waiman Long wrote:
As any change to isolated_cpus is going to be propagated to the
HK_TYPE_DOMAIN housekeeping cpumask, it can be problematic if
housekeeping cpumasks are directly being modified from the CPU hotplug
code path. This is especially the case if we are going to enable dynamic
update to the nohz_full housekeeping cpumask (HK_TYPE_KERNEL_NOISE)
in the near future with the help of CPU hotplug.

Avoid these potential problems by changing the cpuset code to not
updating isolated_cpus when calling from CPU hotplug. A new special
PRS_INVALID_ISOLCPUS is added to indicate the current cpuset is an
invalid partition but its effective_xcpus are still in isolated_cpus.
This special state will be set if an isolated partition becomes invalid
due to the shutdown of the last active CPU in that partition. We also
need to keep the effective_xcpus even if exclusive_cpus isn't set.

When changes are made to "cpuset.cpus", "cpuset.cpus.exclusive" or
"cpuset.cpus.partition" of a PRS_INVALID_ISOLCPUS cpuset, its state
will be reset back to PRS_INVALID_ISOLATED and its effective_xcpus will
be removed from isolated_cpus before proceeding.

As CPU hotplug will no longer update isolated_cpus, some of the test
cases in test_cpuset_prs.h will have to be updated to match the new
expected results. Some new test cases are also added to confirm that
"cpuset.cpus.isolated" and HK_TYPE_DOMAIN housekeeping cpumask will
both be updated.

Signed-off-by: Waiman Long <[email protected]>
---
  kernel/cgroup/cpuset.c                        | 85 ++++++++++++++++---
  .../selftests/cgroup/test_cpuset_prs.sh       | 21 +++--
  2 files changed, 87 insertions(+), 19 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c792380f9b60..48b7f275085b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -159,6 +159,8 @@ static bool force_sd_rebuild;                       /* RWCS 
*/
   *   2 - partition root without load balancing (isolated)
   *  -1 - invalid partition root
   *  -2 - invalid isolated partition root
+ *  -3 - invalid isolated partition root but with effective xcpus still
+ *      in isolated_cpus (set from CPU hotplug side)
   *
   *  There are 2 types of partitions - local or remote. Local partitions are
   *  those whose parents are partition root themselves. Setting of
@@ -187,6 +189,7 @@ static bool force_sd_rebuild;                       /* RWCS 
*/
  #define PRS_ISOLATED          2
  #define PRS_INVALID_ROOT      -1
  #define PRS_INVALID_ISOLATED  -2
+#define PRS_INVALID_ISOLCPUS   -3 /* Effective xcpus still in isolated_cpus */
/*
   * Temporary cpumasks for working with partitions that are passed among
@@ -382,6 +385,30 @@ static inline bool is_in_v2_mode(void)
              (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE);
  }
+/*
+ * If the given cpuset has a partition state of PRS_INVALID_ISOLCPUS,
+ * remove its effective_xcpus from isolated_cpus and reset its state to
+ * PRS_INVALID_ISOLATED. Also clear effective_xcpus if exclusive_cpus is
+ * empty.
+ */
+static void fix_invalid_isolcpus(struct cpuset *cs, struct cpuset *trialcs)
+{
+       if (likely(cs->partition_root_state != PRS_INVALID_ISOLCPUS))
+               return;
+       WARN_ON_ONCE(cpumask_empty(cs->effective_xcpus));
+       spin_lock_irq(&callback_lock);
+       cpumask_andnot(isolated_cpus, isolated_cpus, cs->effective_xcpus);
+       if (cpumask_empty(cs->exclusive_cpus))
+               cpumask_clear(cs->effective_xcpus);
+       cs->partition_root_state = PRS_INVALID_ISOLATED;
+       spin_unlock_irq(&callback_lock);
+       isolated_cpus_updating = true;
+       if (trialcs) {
+               trialcs->partition_root_state = PRS_INVALID_ISOLATED;
+               cpumask_copy(trialcs->effective_xcpus, cs->effective_xcpus);
+       }
+}
When fix_invalid_isolcpus is called from changing cpus/exclusive cpus, should we
copy cs->effective_xcpus to trialcs->effective_xcpus?

I tested as follow steps(using the whole series):

  # cd /sys/fs/cgroup/
  # mkdir test
  # echo 1 > cpuset.cpus.
  # cd test/
  # echo 1 > cpuset.cpus.exclusive
  # echo $$ > cgroup.procs
  # echo isolated > cpuset.cpus.partition
  # cat cpuset.cpus.partition
isolated
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # cat cpuset.cpus.partition
isolated invalid
  # echo 2 > cpuset.cpus.exclusive
  # cat cpuset.cpus.partition
isolated invalid (Parent unable to distribute cpu downstream)

After changing cpuset.cpus.exclusive to 2, the test cpuset should
become valid again, but it remains invalid.

Right, changes to trialcs->effective_xcpus is unnecessary() as compute_trialcs_excpus() will be called before fix_invalid_isolcpus() is invoked. Will fix that in the next version.

Thanks,
Longman


Reply via email to