On 2/18/26 10:02 AM, JP Kobryn (Meta) wrote:
On 2/18/26 12:54 AM, Vlastimil Babka (SUSE) wrote:
On 2/18/26 04:29, JP Kobryn (Meta) wrote:
From: JP Kobryn <[email protected]>
There are situations where reclaim kicks in on a system with free
memory.
One possible cause is a NUMA imbalance scenario where one or more
nodes are
under pressure. It would help if we could easily identify such nodes.
Move the pgscan and pgsteal counters from vm_event_item to
node_stat_item
to provide per-node reclaim visibility. With these counters as node
stats,
the values are now displayed in the per-node section of /proc/zoneinfo,
which allows for quick identification of the affected nodes.
/proc/vmstat continues to report the same counters, aggregated across
all
nodes. But the ordering of these items within the readout changes as
they
move from the vm events section to the node stats section.
Memcg accounting of these counters is preserved. The relocated counters
remain visible in memory.stat alongside the existing aggregate pgscan
and
pgsteal counters.
However, this change affects how the global counters are accumulated.
Previously, the global event count update was gated on !
cgroup_reclaim(),
excluding memcg-based reclaim from /proc/vmstat. Now that
mod_lruvec_state() is being used to update the counters, the global
counters will include all reclaim. This is consistent with how pgdemote
counters are already tracked.
Hm so that leaves PGREFILL (scanned in the active list) the odd one out,
right? Not being per-node and gated on !cgroup_reclaim() for global
stats.
Should we change it too for full consistency?
I'm fine with adding coverage for the active list side as well. For
completeness, I could also include PGDEACTIVATE.
Actually, I see PGDEACTIVATE is not gated so I'll leave that one out.
I'll send v3 and include PGREFILL.