On 12/8/25 9:32 AM, Michal Koutný wrote:
Hi Waiman.
On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long <[email protected]> wrote:
Modification to cpumasks are all serialized by the cpuset_mutex. If you are
referring to 2 or more tasks doing parallel updates to various cpuset
control files of sibling cpusets, the results can actually vary depending on
the actual serialization results of those operations.
I meant the latter when the difference in results when concurrent tasks
do the update (e.g. two containers start in parallel), I don't see an
issue with the race wrt consistency of in-kernel data. We're on the same
page here.
One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact
that operations on cpuset.cpus.exclusive can fail if the result is not
exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed
unless none of the exclusive CPUs are passed down from the parent. The use
of cpuset.cpus.exclusive is required for creating remote partition.
OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition
root is not guaranteed and is limited to the creation of local partition
only.
Does that answer your question?
It does help my understanding. Do you envision that remote and local
partitions should be used together (in one subtree)?
It should be rare to have both remote and local partition enabled in the
same system, though it is not disallowed. The local partition should
only be used on system that run a small number of applications with one
or just a few that need partition support. For systems that run a large
number of containerized applications like a Kubernetes managed system,
local partition cannot be used because of the way container management
is being done as the actual cgroups associated with a container can be a
bit far from the cgroup root. Remote partition was created for such a
use case where local partition will be used at all.
Cheers,
Longman