Thanks again for all the feedback, Matthew.  I've incorporated the
requested suggestions and will send out a v2 shortly.

** Summary changed:

- Re-enable memcg v1 on Noble (6.14)
+ enable MEMCG_V1 and CPUSETS_V1 on Noble HWE

** Description changed:

  [Impact]
  
- Although v1 cgroups are deprecated in Noble, it was still possible for 
+ Although v1 cgroups are deprecated in Noble, it was still possible for
  users on 6.8 kernels to utilize them.  This was especially helpful in
- helping migrating users to Noble and then separately upgrading their
- remaining v1 cgroups applications.  Instead of requiring all users to
- upgrade and fix their v2 support, v1 support could be provisionally
- enabled until the necessary support was available in the applications
- that still lack v2 support.
+ the Noble migration process.  It allowed users to pick up the new OS and
+ then separately upgrade their remaining v1 cgroups applications.  This
+ unblocked the migration path for v1 cgroups users, because v1 support
+ could be provisionally enabled until the necessary support was available
+ in the applications that still lack v2 support.
  
- Starting in 6.12, CONFIG_MEMCG_V1 was added and defaulted to false.
- Noble 6.8 users that were unlucky enough to still need V1 cgroups found
- that they could no longer use memcgs in the 6.14 kernel.
+ Starting in 6.12, CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 were added and
+ defaulted to false.  Noble 6.8 users that were unlucky enough to still
+ need these V1 cgroups found that they could no longer use them in the
+ 6.14 kernel.
  
- Specific use cases include older JVMs that fail to correctly handle
- missing controllers from /proc/cgroups.  In that case, the container
- limit detection is turned off and the JVM uses the host's limits.
+ Some of the specific failures that were encountered include older JVMs
+ that fail to correctly handle missing controllers from /proc/cgroups.
+ If memory or cpuset are absent, the container limit detection is turned
+ off and the JVM uses the host's limits.  JVMs configured in containers
+ with specific memory usage percentages then end up consuming too much
+ memory and often crash.
  
- Further, Apache Yarn is still completing their v1 -> v2 migration, which
- leaves some Hadoop use cases without proper support.
+ Apache Yarn is still completing their v1 -> v2 migration, which leaves
+ some Hadoop use cases without proper support.
  
- The request here is to enable MEMCG_V1 on Noble, but not newer releases,
- for as long as the Noble HWE kernel train still has kernels with cgroup
- v1 support.  This gives users a little bit longer to complete their
- migration while still using newer hardware, but with the understanding
- that this really is the end of the line for v1 cgroups.
+ The request here is to enable these V1 controllers on Noble, but not
+ newer releases, for as long as the Noble HWE kernel train still has
+ kernels with upstream cgroup v1 support.  This gives users a little bit
+ longer to complete their migration while still using newer hardware, but
+ with the understanding that this really is the end of the line for v1
+ cgroups.
  
  [Fix]
  
- Re-enable CONFIG_MEMCG_V1 in the 6.14 Noble config.
+ Re-enable the missing v1 controllers in the 6.14 Noble config.
+ 
+ In 6.8 there were 14 controllers.  In the current 6.14 config there are
+ also 14 controllers.  However, the difference is that the current 6.14
+ build the dmem controller was added, and the cpuset and memory
+ controllers were removed.
+ 
+ Diffing both the /proc/cgroups and configs between the 6.14 and 6.8
+ releases gives:
+ 
+   -CPUSETS_V1 n
+   -MEMCG_V1 n
+ 
+ These differences were also corroborated via source inspection.  Changes
+ in 6.12 moved these controllers to be guarded by ifdefs that default to
+ being disabled via make olddefconfig.
+ 
+ In order to ensure that 6.14 has the same v1 cgroup controllers enabled
+ as 6.8, enable both CONFIG_CPUSETS_V1 and CONFIG_MEMCG_V1 for Noble.
  
  [Test]
  
- Booted a kernel with this change and validated that v1 memcgs were
- present again.
+ Booted a kernel with this change and validated that the missing v1
+ memcgs were present again.
  
- [Potential Regression]
+ Before:
  
- The regression potential here should be low since this merely restores
- and existing feature that most users were not using but that a few still
- depended upon.
+    $ grep memory /proc/cgroups 
+    $ grep cpuset /proc/cgroups 
+    
+  with v1 cgroups enabled:
+    
+    $ mount | grep cgroup | grep memory
+    $ mount | grep cgroup | grep cpuset
+    
+    $ ls /sys/fs/cgroup | grep memory
+    $ ls /sys/fs/cgroup | grep cpuset
+ 
+ After:
+ 
+    $ grep memory /proc/cgroups 
+    memory     0       88      1
+    $ grep cpuset /proc/cgroups 
+    cpuset     0       88      1
+    
+  with v1 cgroups enabled:
+    
+    $ mount | grep cgroup | grep memory
+    cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
+    $ mount | grep cgroup | grep cpuset
+    cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
+    
+    $ ls /sys/fs/cgroup | grep memory
+    memory
+    $ ls /sys/fs/cgroup | grep cpuset
+    cpuset
+ 
+ A config diff of the previous build versus a build cranked from these
+ patches:
+ 
+  CPUSETS_V1 n -> y
+  MEMCG_V1 n -> y
+ 
+ [Where problems can occur]
+ 
+ Since these changes re-introduce code that was disabled via ifdef,
+ there's a possible increase in the binary size.  After comparing the
+ results from an identical build with these config flags disabled, the
+ difference in compressed artifact size for an x86 vmlinuz is an increase
+ of 16k.
+ 
+ The difference in uncompressed memory usage after boot is an increase of
+ 40k, broken down as 21k code, 19k rwdata, 12k rodata, 8k init, -28k
+ bss, and 8k reserved.
+ 
+ The primary remaining risk is around future breakage of these interfaces
+ since they are no longer part of the default configuration.  If this is
+ not part of upstream's test matrix, then there is additional potential
+ breakage possible. However, the author has no knowledge of actual v1
+ cgroups breakage at the time this patch is being submitted.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.14 in Ubuntu.
https://bugs.launchpad.net/bugs/2122368

Title:
  enable MEMCG_V1 and CPUSETS_V1 on Noble HWE

Status in linux-hwe-6.14 package in Ubuntu:
  New
Status in linux-hwe-6.14 source package in Noble:
  In Progress

Bug description:
  [Impact]

  Although v1 cgroups are deprecated in Noble, it was still possible for
  users on 6.8 kernels to utilize them.  This was especially helpful in
  the Noble migration process.  It allowed users to pick up the new OS and
  then separately upgrade their remaining v1 cgroups applications.  This
  unblocked the migration path for v1 cgroups users, because v1 support
  could be provisionally enabled until the necessary support was available
  in the applications that still lack v2 support.

  Starting in 6.12, CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 were added and
  defaulted to false.  Noble 6.8 users that were unlucky enough to still
  need these V1 cgroups found that they could no longer use them in the
  6.14 kernel.

  Some of the specific failures that were encountered include older JVMs
  that fail to correctly handle missing controllers from /proc/cgroups.
  If memory or cpuset are absent, the container limit detection is turned
  off and the JVM uses the host's limits.  JVMs configured in containers
  with specific memory usage percentages then end up consuming too much
  memory and often crash.

  Apache Yarn is still completing their v1 -> v2 migration, which leaves
  some Hadoop use cases without proper support.

  The request here is to enable these V1 controllers on Noble, but not
  newer releases, for as long as the Noble HWE kernel train still has
  kernels with upstream cgroup v1 support.  This gives users a little bit
  longer to complete their migration while still using newer hardware, but
  with the understanding that this really is the end of the line for v1
  cgroups.

  [Fix]

  Re-enable the missing v1 controllers in the 6.14 Noble config.

  In 6.8 there were 14 controllers.  In the current 6.14 config there are
  also 14 controllers.  However, the difference is that the current 6.14
  build the dmem controller was added, and the cpuset and memory
  controllers were removed.

  Diffing both the /proc/cgroups and configs between the 6.14 and 6.8
  releases gives:

    -CPUSETS_V1 n
    -MEMCG_V1 n

  These differences were also corroborated via source inspection.  Changes
  in 6.12 moved these controllers to be guarded by ifdefs that default to
  being disabled via make olddefconfig.

  In order to ensure that 6.14 has the same v1 cgroup controllers enabled
  as 6.8, enable both CONFIG_CPUSETS_V1 and CONFIG_MEMCG_V1 for Noble.

  [Test]

  Booted a kernel with this change and validated that the missing v1
  memcgs were present again.

  Before:

     $ grep memory /proc/cgroups 
     $ grep cpuset /proc/cgroups 
     
   with v1 cgroups enabled:
     
     $ mount | grep cgroup | grep memory
     $ mount | grep cgroup | grep cpuset
     
     $ ls /sys/fs/cgroup | grep memory
     $ ls /sys/fs/cgroup | grep cpuset

  After:

     $ grep memory /proc/cgroups 
     memory     0       88      1
     $ grep cpuset /proc/cgroups 
     cpuset     0       88      1
     
   with v1 cgroups enabled:
     
     $ mount | grep cgroup | grep memory
     cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
     $ mount | grep cgroup | grep cpuset
     cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
     
     $ ls /sys/fs/cgroup | grep memory
     memory
     $ ls /sys/fs/cgroup | grep cpuset
     cpuset

  A config diff of the previous build versus a build cranked from these
  patches:

   CPUSETS_V1 n -> y
   MEMCG_V1 n -> y

  [Where problems can occur]

  Since these changes re-introduce code that was disabled via ifdef,
  there's a possible increase in the binary size.  After comparing the
  results from an identical build with these config flags disabled, the
  difference in compressed artifact size for an x86 vmlinuz is an increase
  of 16k.

  The difference in uncompressed memory usage after boot is an increase of
  40k, broken down as 21k code, 19k rwdata, 12k rodata, 8k init, -28k
  bss, and 8k reserved.

  The primary remaining risk is around future breakage of these interfaces
  since they are no longer part of the default configuration.  If this is
  not part of upstream's test matrix, then there is additional potential
  breakage possible. However, the author has no knowledge of actual v1
  cgroups breakage at the time this patch is being submitted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.14/+bug/2122368/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to