On 3/8/26 12:20 PM, Gregory Price wrote:
On Sat, Mar 07, 2026 at 08:27:22PM +0800, Huang, Ying wrote:
"JP Kobryn (Meta)" <[email protected]> writes:
hit
- for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
- for other policies, allocation succeeded on intended node
- counted on the node of the allocation
miss
- allocation intended for other node, but happened on this one
- counted on other node
foreign
- allocation intended on this node, but happened on other node
- counted on this node
Counters are exposed per-memcg, per-node in memory.numa_stat and globally
in /proc/vmstat.
IMHO, it may be better to describe your workflow as an example to use
the newly added statistics. That can describe why we need them. For
example, what you have described in
https://lore.kernel.org/linux-mm/[email protected]/
1) Pressure/OOMs reported while system-wide memory is free.
2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
down node(s) under pressure. They become available in
/sys/devices/system/node/nodeN/vmstat.
3) Check per-policy allocation counters (this patch) on that node to
find what policy was driving it. Same readout at nodeN/vmstat.
4) Now use /proc/*/numa_maps to identify tasks using the policy.
One question. If we have to search /proc/*/numa_maps, why can't we
find all necessary information via /proc/*/numa_maps? For example,
which VMA uses the most pages on the node? Which policy is used in the
VMA? ...
I am a little confused by this too - consider:
7f85dca86000 interleave=0,1 file=[...] mapped=14 mapmax=5 N0=3 N1=10 ...
Is n0=3 and N1=10 because we did those allocations according to the
policy but got fallbacks, or is it that way because we did 7/7 and
then things got migrated due to pressure?
That ambiguity should be resolved with this patch.
Do these counters let you capture that, or does it just make the numbers
even more meaningless?
You would be able to look at the new counters and see that the
allocations were distributed evenly at the time of allocation. If an
imbalance is observed afterward we would know that it was due to
migration.
The page allocator will happily fallback to other nodes - even when a
mempolicy is present - because mempolicy is more of a suggestion rather
than a rule (unlike cpusets). So I'd like to understand how these
counters are intended to be used a little better.
That was the motivation for v2. In the previous rev, there was debate on
the lack of accounting for the fallback cases. So in this patch we
account for the fallbacks by making use of miss/foreign. In terms of how
the counters are intended to be used, the workflow would resemble:
1) Pressure/OOMs reported while system-wide memory is free.
2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
down node(s) under pressure.
3) Check per-policy hit/miss/foreign counters (added by this patch) on
node(s) to see what policy is driving allocations there (intentional
vs fallback).
4) Use /proc/*/numa_maps to identify tasks using the policy.