On 2025-12-18 13:00, Mark Brown wrote:
On Sat, Dec 13, 2025 at 01:56:07PM -0500, Mathieu Desnoyers wrote:

Use hierarchical per-cpu counters for rss tracking to fix the per-mm RSS
tracking which has become too inaccurate for OOM killer purposes on
large many-core systems.

We're seeing boot time crashes in -next on the Arm FVP and Ampere Altra
which bisect to this patch which is commit 240587b6cca2822d.  Many other
platforms aren't showing this, though we do have some other breakage in
-next which might be obscuring things.  We get a NULL dereference:

[    2.481143] Unable to handle kernel NULL pointer dereference at virtual 
address 0000000000000000

...

[    2.485036] Call trace:
[    2.485094]  acct_account_cputime+0x40/0xa4 (P)
[    2.485226]  irqtime_account_process_tick+0x17c/0x1d8
[    2.485382]  account_process_tick+0x12c/0x148
[    2.485531]  update_process_times+0x28/0xdc
[    2.485656]  tick_nohz_handler+0xbc/0x1bc
[    2.485809]  __hrtimer_run_queues+0x130/0x184

I note that __acct_update_integrals is being called from here most
likely inline and doing get_mm_rss().  That uses get_mm_counter() which
we've updated in this patch, though I didn't spot the specific issue
yet.

There is something fishy in mm/init-mm.c:init_mm. The initialization
of

       .cpu_bitmap     = CPU_BITS_NONE,

Keeps room for a NR_CPUs cpumask in that structure, but does not take
into account the new extra room needed for mm_cid and the hierarchical
per-cpu counters:

in mm_cache_init() we have:

       mm_size = sizeof(struct mm_struct) + cpumask_size() + mm_cid_size() + 
get_rss_stat_items_size();

So AFAIU we should extend this end-of-mm size to include room for
mm_cid_size() (2 * cpumask_size), which would be an upstream bug,
and now room for get_rss_stat_items_size() (which is an issue specific
to -next due to hierarchical per-cpu counters).

An ugly work-around that may work (and then we can improve on this),
at the end of mm/init-mm.c:init_mm (completely untested):

       .cpu_bitmap     = { [0 ... ((3*BITS_TO_LONGS(NR_CPUS))-1 + ((69905 * 
NR_MM_COUNTERS * 64) / BYTES_PER_LONG))] = 0UL },

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Reply via email to