On Fri, Feb 08, 2019 at 10:44:19PM +0000, Chris Down wrote: > memory.stat and other files already consider subtrees in their output, > and we should too in order to not present an inconsistent interface. > > The current situation is fairly confusing, because people interacting > with cgroups expect hierarchical behaviour in the vein of memory.stat, > cgroup.events, and other files. For example, this causes confusion when > debugging reclaim events under low, as currently these always read "0" > at non-leaf memcg nodes, which frequently causes people to misdiagnose > breach behaviour. The same confusion applies to other counters in this > file when debugging issues. > > Aggregation is done at write time instead of at read-time since these > counters aren't hot (unlike memory.stat which is per-page, so it does it > at read time), and it makes sense to bundle this with the file > notifications. > > After this patch, events are propagated up the hierarchy: > > [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events > low 0 > high 0 > max 0 > oom 0 > oom_kill 0 > [root@ktst ~]# systemd-run -p MemoryMax=1 true > Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service > [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events > low 0 > high 0 > max 7 > oom 1 > oom_kill 1 > > As this is a change in behaviour, this can be reverted to the old > behaviour by mounting with the `memory_localevents` flag set. However, > we use the new behaviour by default as there's a lack of evidence that > there are any current users of memory.events that would find this change > undesirable. > > Signed-off-by: Chris Down <[email protected]> > Cc: Andrew Morton <[email protected]> > Cc: Johannes Weiner <[email protected]> > Cc: Michal Hocko <[email protected]> > Cc: Tejun Heo <[email protected]> > Cc: Roman Gushchin <[email protected]> > Cc: Dennis Zhou <[email protected]> > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected]
Acked-by: Johannes Weiner <[email protected]>

