Introduce hierarchical per-cpu counters and use them for rss tracking to fix the per-mm RSS tracking which has become too inaccurate for OOM killer purposes on large many-core systems.
The approach proposed here is to replace this by the hierarchical per-cpu counters, which bounds the inaccuracy based on the system topology with O(N*logN). Relevant delta since v7: Initialize the subsystem earlier in start_kernel() so it is ready before any mm is created. Introduce and use a precise sum positive API to cover the scenario where an unlucky precise sum iteration happens concurrently with a sequence of counter updates that makes it observe a negative sum. Testing and feedback are welcome! Thanks, Mathieu Cc: Andrew Morton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Martin Liu <[email protected]> Cc: David Rientjes <[email protected]> Cc: [email protected] Cc: Shakeel Butt <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Sweet Tea Dorminy <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: "Liam R . Howlett" <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Wei Yang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Yu Zhao <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Aboorva Devarajan <[email protected]> Mathieu Desnoyers (2): lib: Introduce hierarchical per-cpu counters mm: Fix OOM killer inaccuracy on large many-core systems include/linux/mm.h | 10 +- include/linux/mm_types.h | 4 +- include/linux/percpu_counter_tree.h | 217 +++++++++++++++ include/trace/events/kmem.h | 2 +- init/main.c | 2 + kernel/fork.c | 32 ++- lib/Makefile | 1 + lib/percpu_counter_tree.c | 392 ++++++++++++++++++++++++++++ 8 files changed, 641 insertions(+), 19 deletions(-) create mode 100644 include/linux/percpu_counter_tree.h create mode 100644 lib/percpu_counter_tree.c -- 2.39.5
