This is an automated email from the ASF dual-hosted git repository.

bmahler pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
     new c1b42f79d [docs] Add public docs for Cgroups v2.
c1b42f79d is described below

commit c1b42f79d157aa900be5cbb0c1023f1479962aa7
Author: Jason Zhou <[email protected]>
AuthorDate: Fri Aug 23 14:59:00 2024 -0400

    [docs] Add public docs for Cgroups v2.
    
    Currently there is no official documentation outlining the changes
    we have been making to support Cgroups v2.
    
    We add a main document outlining how Mesos interacts with Cgroups v2,
    and update some documents on the changes that were made, such as
    the device isolator document.
    
    Review: https://reviews.apache.org/r/75191/
---
 docs/cgroups2-support.md          | 90 +++++++++++++++++++++++++++++++++++++++
 docs/isolators/cgroups-devices.md |  7 +++
 docs/mesos-containerizer.md       |  5 +++
 3 files changed, 102 insertions(+)

diff --git a/docs/cgroups2-support.md b/docs/cgroups2-support.md
new file mode 100644
index 000000000..692eab23b
--- /dev/null
+++ b/docs/cgroups2-support.md
@@ -0,0 +1,90 @@
+---
+title: Apache Mesos - Cgroups v2 Support
+layout: documentation
+---
+
+# Using Mesos on systems with Cgroups2 enabled
+
+As part of the move towards Cgroups2, the Cgroups isolator has been updated to
+support the updated interface, Changes are outlined below, and it is 
recommended
+to read up on the [Cgroups 
v2](https://docs.kernel.org/admin-guide/cgroup-v2.html)
+documentation for an deeper understanding.
+
+### Requirements
+
+The `cgroups2` filesystem must be mounted at `/sys/fs/cgroup`. This allows 
Mesos
+to pick the Cgroups2 Isolator when creating the Mesos Containerizer.
+
+### Cgroup Names
+
+A cgroup called “CGROUP_NAME” has a path `/sys/fs/cgroup/$CGROUP_NAME`. This
+applies for all cgroups. A cgroup's name is the cgroup's path relative to
+`/sys/fs/cgroup`, where the cgroup2 filesystem is mounted.
+
+`flags.cgroups_root` (default: "mesos"): Root cgroup name.
+
+The client has control over the name of the root cgroup subtree under
+`/sys/fs/cgroup` that Mesos manages. The default name is “mesos”.
+
+### Process Cgroup
+
+Every process Mesos manages will have a cgroup, and a leaf cgroup under it 
which
+contains the pids. This is done to adhere to the [No Internal Process 
Constraint](https://docs.kernel.org/admin-guide/cgroup-v2.html#no-internal-process-constraint)
+imposed by Cgroups v2.
+
+### Container
+
+When the cgroups v2 isolator is `prepare`d for a new container, cgroups are
+created for the new container. When the cgroups v2 isolator `isolate`s, the new
+container is moved into it's leaf cgroup.
+
+Container Non-leaf Cgroup: `<flags.cgroups_root>/<containerId>`
+
+Container Leaf Cgroup: `<flags.cgroups_root>/<containerId>/leaf`
+
+### Nested Containers
+
+The Cgroups v2 isolator supports nested containers.
+
+Unlike Cgroups v1, we now create cgroups for all containers, even if they
+indicated they do not want their own resource isolation. This is to make it
+easier to keep track of a container’s processes.
+
+If a container does not wish to have its own resource isolation, it can pass in
+a flag `share_cgroups` and the isolator will not update any controllers for it.
+
+### Systemd Integration
+
+We currently do not have systemd integration. This section should be updated
+with our approach if systemd support is implemented.
+
+### Linux Launcher & Cgroups v2 Isolator
+
+On Linux systems that support cgroups v2, the Mesos Containerizer will use the 
[Linux 
Launcher](https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/linux_launcher.cpp)
 and the [Cgroups v2 
Isolator](https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/cgroups2/cgroups2.cpp).
+
+It’s recommended to review to code to gain a complete understanding of these 
steps.
+
+Operations on startup:
+
+- Linux Launcher `recover`: Parse the cgroups subtree rooted at
+`flags.cgroups_root` to obtain container ids. Compares the persisted state to
+the recovered dcontainers to determine what contains are orphans.
+- Cgroups v2 Isolator `recover`: Create internal state to track recovered
+containers. Calls `recover` on all of the controllers that are used by each of
+the recovered containers.
+
+Operations when a new container is started:
+
+- Cgroups v2 Isolator `prepare`: Creates cgroups for the new container and adds
+the container to isolator's internal state. Configures namespace creation flags
+and mount setups; does not create mounts or namespaces. Calls `prepare` on all
+of the controllers that are used by the new container.
+- Linux Launcher `fork`: Forks the Mesos Agent process to create the new
+container's process. Also moves the child processes into the container's leaf
+cgroup. Creates mounts and namespaces.
+- Cgroups v2 Isolator `watch`: Calls `watch` on each of the controllers that
+are used by the container. When a resource-watch promise is resolved a handler
+is invoked.
+- Cgroups v2 Isolator `isolate`: Calls `isolate` on each of the controllers 
that
+are used by the container. Then moves the container process into the 
container's
+leaf cgroup; at this point the container is isolated.
\ No newline at end of file
diff --git a/docs/isolators/cgroups-devices.md 
b/docs/isolators/cgroups-devices.md
index 0807bd79e..f8c4ec861 100644
--- a/docs/isolators/cgroups-devices.md
+++ b/docs/isolators/cgroups-devices.md
@@ -12,6 +12,13 @@ track and enforce open and mknod restrictions on device 
files. To enable the
 `cgroups/devices` isolator, append `cgroups/devices` to the `--isolation` flag
 when starting the Mesos agent.
 
+## Changes for Cgroups2 Support
+
+In Croups2, we create EBPF programs to keep track of which devices
+would be allowed or denied access. This is because cgroups2 no longer offers
+interface files for device access controls. Our default witelisted devices list
+remains unchanged for cgroups2.
+
 ## Default whitelisted devices
 
 The following devices are, by default, whitelisted for each container, if you
diff --git a/docs/mesos-containerizer.md b/docs/mesos-containerizer.md
index 9a7b25d2b..6c2c5ba9a 100644
--- a/docs/mesos-containerizer.md
+++ b/docs/mesos-containerizer.md
@@ -77,3 +77,8 @@ unit file of Mesos agent, for example:
 [Service]
 Delegate=true
 ```
+
+## Cgroups2 Integration
+
+In order to support the new requirements for Cgroups V2, the changes are
+documented in the [Cgroups2 Support](cgroups2-support.md) documentation.

Reply via email to