For users who want to be sure that they can get most of the CPUs they
want, cpuset.cpus.exclusive should be used instead if they can set
it successfully without failure. Setting cpuset.cpus.exclusive will
guarantee that sibling conflicts from then onward is no longer possible.
To make this change, we have to separate out the is_cpu_exclusive()
check in cpus_excl_conflict() into a cgroup v1 only
cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
helper is now no longer needed and can be removed.
Some existing tests in test_cpuset_prs.sh are updated and new ones are
added to reflect the new behavior. The cgroup-v2.rst doc file is also
updated the clarify what exclusive CPUs will be used when a partition
is created.
Reported-by: Sun Shaojie <[email protected]>
Closes:
https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Waiman Long <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 32 +++++---
kernel/cgroup/cpuset-internal.h | 3 +
kernel/cgroup/cpuset-v1.c | 19 +++++
kernel/cgroup/cpuset.c | 80 ++++++-------------
.../selftests/cgroup/test_cpuset_prs.sh | 26 ++++--
5 files changed, 89 insertions(+), 71 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst
b/Documentation/admin-guide/cgroup-v2.rst
index 510df2461aff..a3446db96cea 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2584,9 +2584,9 @@ Cpuset Interface Files
of this file will always be a subset of its parent's
"cpuset.cpus.exclusive.effective" if its parent is not the root
cgroup. It will also be a subset of "cpuset.cpus.exclusive"
- if it is set. If "cpuset.cpus.exclusive" is not set, it is
- treated to have an implicit value of "cpuset.cpus" in the
- formation of local partition.
+ if it is set. This file should only be non-empty if either
+ "cpuset.cpus.exclusive" is set or when the current cpuset is
+ a valid partition root.
cpuset.cpus.isolated
A read-only and root cgroup only multiple values file.
@@ -2618,13 +2618,22 @@ Cpuset Interface Files
There are two types of partitions - local and remote. A local
partition is one whose parent cgroup is also a valid partition
root. A remote partition is one whose parent cgroup is not a
- valid partition root itself. Writing to "cpuset.cpus.exclusive"
- is optional for the creation of a local partition as its
- "cpuset.cpus.exclusive" file will assume an implicit value that
- is the same as "cpuset.cpus" if it is not set. Writing the
- proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
- before the target partition root is mandatory for the creation
- of a remote partition.
+ valid partition root itself.
+
+ Writing to "cpuset.cpus.exclusive" is optional for the creation
+ of a local partition as its "cpuset.cpus.exclusive" file will
+ assume an implicit value that is the same as "cpuset.cpus" if it
+ is not set. Writing the proper "cpuset.cpus.exclusive" values
+ down the cgroup hierarchy before the target partition root is
+ mandatory for the creation of a remote partition.
+
+ Not all the CPUs requested in "cpuset.cpus.exclusive" can be
+ used to form a new partition. Only those that were present
+ in its parent's "cpuset.cpus.exclusive.effective" control
+ file can be used. For partitions created without setting
+ "cpuset.cpus.exclusive", exclusive CPUs specified in sibling's
+ "cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective"
+ also cannot be used.
Currently, a remote partition cannot be created under a local
partition. All the ancestors of a remote partition root except
@@ -2632,6 +2641,9 @@ Cpuset Interface Files
The root cgroup is always a partition root and its state cannot
be changed. All other non-root cgroups start out as "member".
+ Even though the "cpuset.cpus.exclusive*" control files are not
+ present in the root cgroup, they are implicitly the same as
+ "cpuset.cpus".
When set to "root", the current cgroup is the root of a new
partition or scheduling domain. The set of exclusive CPUs is
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index e718a4f54360..e8e2683cb067 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
struct cpumask *new_cpus, nodemask_t *new_mems,
bool cpus_updated, bool mems_updated);
int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
void cpuset1_init(struct cpuset *cs);
void cpuset1_online_css(struct cgroup_subsys_state *css);
int cpuset1_generate_sched_domains(cpumask_var_t **domains,
@@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct
cpuset *cs,
bool cpus_updated, bool mems_updated) {}
static inline int cpuset1_validate_change(struct cpuset *cur,
struct cpuset *trial) { return 0; }
+static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
+ struct cpuset *cs2) { return false; }
static inline void cpuset1_init(struct cpuset *cs) {}
static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index ecfea7800f0d..04124c38a774 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct
cpuset *trial)
return ret;
}
+/*
+ * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU
conflicts
+ * to legacy (v1)
+ * @cs1: first cpuset to check
+ * @cs2: second cpuset to check
+ *
+ * Returns: true if CPU exclusivity conflict exists, false otherwise
+ *
+ * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
+ */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
+{
+ if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+ return cpumask_intersects(cs1->cpus_allowed,
+ cs2->cpus_allowed);
+
+ return false;
+}
+
#ifdef CONFIG_PROC_PID_CPUSET
/*
* proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4819ab429771..83fb83a86b4b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -129,6 +129,17 @@ static bool force_sd_rebuild;
* For simplicity, a local partition can be created under a local or remote
* partition but a remote partition cannot have any partition root in its
* ancestor chain except the cgroup root.
+ *
+ * A valid partition can be formed by setting exclusive_cpus or cpus_allowed
+ * if exclusive_cpus is not set. In the case of partition with empty
+ * exclusive_cpus, all the conflicting exclusive CPUs specified in the
+ * following cpumasks of sibling cpusets will be removed from its
+ * cpus_allowed in determining its effective_xcpus.
+ * - effective_xcpus
+ * - exclusive_cpus
+ *
+ * The "cpuset.cpus.exclusive" control file should be used for setting up
+ * partition if the users want to get as many CPUs as possible.
*/
#define PRS_MEMBER 0
#define PRS_ROOT 1
@@ -616,27 +627,25 @@ static inline bool cpusets_are_exclusive(struct cpuset
*cs1, struct cpuset *cs2)
* Returns: true if CPU exclusivity conflict exists, false otherwise
*
* Conflict detection rules:
- * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
- * 2. exclusive_cpus masks cannot intersect between cpusets
- * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new
exclusive CPUs
+ * o cgroup v1
+ * See cpuset1_cpus_excl_conflict()
+ * o cgroup v2
+ * - The exclusive_cpus values cannot overlap.
+ * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed.
*/
static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset
*sibling,
bool xcpus_changed)
{
- /* If either cpuset is exclusive, check if they are mutually exclusive
*/
- if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
- return !cpusets_are_exclusive(trial, sibling);
-
- /* Exclusive_cpus cannot intersect */
- if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
- return true;
+ if (!cpuset_v2())
+ return cpuset1_cpus_excl_conflict(trial, sibling);
/* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
return true;
- return false;
+ /* Exclusive_cpus cannot intersect */
+ return cpumask_intersects(trial->exclusive_cpus,
sibling->exclusive_cpus);
}
static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
@@ -2312,43 +2321,6 @@ static enum prs_errcode validate_partition(struct cpuset
*cs, struct cpuset *tri
return PERR_NONE;
}
-static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs,
- struct tmpmasks *tmp)
-{
- int retval;
- struct cpuset *parent = parent_cs(cs);
-
- retval = validate_change(cs, trialcs);
-
- if ((retval == -EINVAL) && cpuset_v2()) {
- struct cgroup_subsys_state *css;
- struct cpuset *cp;
-
- /*
- * The -EINVAL error code indicates that partition sibling
- * CPU exclusivity rule has been violated. We still allow
- * the cpumask change to proceed while invalidating the
- * partition. However, any conflicting sibling partitions
- * have to be marked as invalid too.
- */
- trialcs->prs_err = PERR_NOTEXCL;
- rcu_read_lock();
- cpuset_for_each_child(cp, css, parent) {
- struct cpumask *xcpus = user_xcpus(trialcs);
-
- if (is_partition_valid(cp) &&
- cpumask_intersects(xcpus, cp->effective_xcpus)) {
- rcu_read_unlock();
- update_parent_effective_cpumask(cp,
partcmd_invalidate, NULL, tmp);
- rcu_read_lock();
- }
- }
- rcu_read_unlock();
- retval = 0;
- }
- return retval;
-}
-
/**
* partition_cpus_change - Handle partition state changes due to CPU mask
updates
* @cs: The target cpuset being modified
@@ -2408,15 +2380,15 @@ static int update_cpumask(struct cpuset *cs, struct
cpuset *trialcs,
if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
return 0;
- if (alloc_tmpmasks(&tmp))
- return -ENOMEM;
-
compute_trialcs_excpus(trialcs, cs);
trialcs->prs_err = PERR_NONE;
- retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
+ retval = validate_change(cs, trialcs);
if (retval < 0)
- goto out_free;
+ return retval;
+
+ if (alloc_tmpmasks(&tmp))
+ return -ENOMEM;
/*
* Check all the descendants in update_cpumasks_hier() if
@@ -2439,7 +2411,7 @@ static int update_cpumask(struct cpuset *cs, struct
cpuset *trialcs,
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */
if (cs->partition_root_state)
update_partition_sd_lb(cs, old_prs);
-out_free:
+
free_tmpmasks(&tmp);
return retval;
}
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..ff4540b0490e 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -269,7 +269,7 @@ TEST_MATRIX=(
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0
A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0
A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0
A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
- " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0
A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
+ " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0
A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
" C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2
4-5"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4
A3:P2|B1:P2 2-4"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4
A3:P2|B1:P2 2-4"
@@ -318,7 +318,7 @@ TEST_MATRIX=(
# Invalid to valid local partition direct transition tests
" C1-3:S+:P2 X4:P2 . . . . . . 0
A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
" C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0
A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
- " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6
A1:P-2|B1:P0"
+ " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6
A1:P2|B1:P0"
" C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6
A1:P2|B1:P0 0-3"
# Local partition invalidation tests
@@ -388,10 +388,10 @@ TEST_MATRIX=(
" C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1
A1:P0|A2:P-2"
" C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1
A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
- " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3
A1:P-1|B1:P0"
- " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3
A1:P-1|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3
A1:P0|B1:P-1"
+ # A non-exclusive cpuset.cpus change will not invalidate its siblings
partition.
+ " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3
A1:P1|B1:P0"
+ " C0-1:P1 . . P1:C2-3 C0-2 . . . 0
A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1"
+ " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3
A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
" C0-3 . . C4-5 X5 . . . 0
A1:0-3|B1:4-5"
@@ -417,6 +417,14 @@ TEST_MATRIX=(
" CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0
A1:1|A2:2-4|B1:5-6 \
A1:P0|A2:P2:B1:P-1 2-4"
+ # When multiple partitions with conflicting cpuset.cpus are created, the
+ # latter created ones will only get what are left of the available
exclusive
+ # CPUs.
+ " C1-3:P1 . . . . . . C3-5:P1 0
A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1"
+
+ # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive
+ " C1-3:X1-3 . . C4-5 . . . C1-2 0
A1:1-3|B1:1-2"
+
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs
Pstate ISOLCPUS
# ------ ------ ------ ------ ------ ------ ------ ------ ---- -----
------ --------
# Failure cases:
@@ -427,7 +435,7 @@ TEST_MATRIX=(
# Changes to cpuset.cpus.exclusive that violate exclusivity rule is
rejected
" C0-3 . . C4-5 X0-3 . . X3-5 1
A1:0-3|B1:4-5"
- # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
+ # cpuset.cpus.exclusive cannot be set to a superset of sibling's
cpuset.cpus
" C0-3 . . C4-5 X3-5 . . . 1
A1:0-3|B1:4-5"
)
@@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=(
. . X1-2:P2 X4-5:P1 . X1-7:P2
p1:3|c11:1-2|c12:4:c22:5-6 \
p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \
1-2,4-6|1-2,5-6"
+ # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid
partition
+ " C1-5:P1:S+ . C1-4:P1 C2-3 . . \
+ . . . P1 . . p1:5|c11:1-4|c12:5 \
+ p1:P1|c11:P1|c12:P-1"
)
#