On 3/10/26 7:56 PM, Huang, Ying wrote:
"JP Kobryn (Meta)" <[email protected]> writes:

On 3/7/26 4:27 AM, Huang, Ying wrote:
"JP Kobryn (Meta)" <[email protected]> writes:

When investigating pressure on a NUMA node, there is no straightforward way
to determine which policies are driving allocations to it.

Add per-policy page allocation counters as new node stat items. These
counters track allocations to nodes and also whether the allocations were
intentional or fallbacks.

The new stats follow the existing numa hit/miss/foreign style and have the
following meanings:

    hit
      - for BIND and PREFERRED_MANY, allocation succeeded on node in nodemask
      - for other policies, allocation succeeded on intended node
      - counted on the node of the allocation
    miss
      - allocation intended for other node, but happened on this one
      - counted on other node
    foreign
      - allocation intended on this node, but happened on other node
      - counted on this node

Counters are exposed per-memcg, per-node in memory.numa_stat and globally
in /proc/vmstat.
IMHO, it may be better to describe your workflow as an example to
use
the newly added statistics.  That can describe why we need them.  For
example, what you have described in
https://lore.kernel.org/linux-mm/[email protected]/

1) Pressure/OOMs reported while system-wide memory is free.
2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
down node(s) under pressure. They become available in
/sys/devices/system/node/nodeN/vmstat.
3) Check per-policy allocation counters (this patch) on that node to
find what policy was driving it. Same readout at nodeN/vmstat.
4) Now use /proc/*/numa_maps to identify tasks using the policy.


Good call. I'll add a workflow adapted for the current approach in
the next revision. I included it in another response in this thread, but
I'll repeat here because it will make it easier to answer your question
below.

1) Pressure/OOMs reported while system-wide memory is free.
2) Check /proc/zoneinfo or per-node stats in .../nodeN/vmstat to narrow
    down node(s) under pressure.
3) Check per-policy hit/miss/foreign counters (added by this patch) on
    node(s) to see what policy is driving allocations there (intentional
    vs fallback).
4) Use /proc/*/numa_maps to identify tasks using the policy.

One question.  If we have to search /proc/*/numa_maps, why can't we
find all necessary information via /proc/*/numa_maps?  For example,
which VMA uses the most pages on the node?  Which policy is used in the
VMA? ...


There's a gap in the flow of information if we go straight from a node
in question to numa_maps. Without step 3 above, we can't distinguish
whether pages landed there intentionally, as a fallback, or were
migrated sometime after the allocation. These new counters track the
results of allocations at the time they happen, preserving that
information regardless of what may happen later on.

Sorry for late reply.

IMHO, step 3) doesn't add much to the flow.  It only counts allocation,
not migration, freeing, etc.

This logic would undermine other existing stats.

I'm afraid that it may be misleading.  For
example, if a lot of pages have been allocated with a mempolicy, then
these pages are freed.  /proc/*/numa_maps are more useful stats for the
goal.

numa_maps only show live snapshots with no attribution. Even if we
tracked them over time, there's no way to determine if the allocations
exist as a result of a policy decision.

To get all necessary information, I think that more thorough
tracing is necessary.

Tracking other sources of pages on a node (migration, etc) is
beyond the goal of this patch.

Reply via email to