(A temporary hack, to be dropped after the rebase on top of RHEL 8.4.) In certain cases, the number of managed pages in a memory zone becomes less than the number of free pages, leading to negative or overly large 'MemUsed' value (managed_pages - free_pages) shown in /sys/devices/system/node/node*/meminfo.
It is suspected that the number of managed pages is calculated incorrectly for some reason on NUMA systems. However, the root cause is unclear. The patch detects such conditions, outputs a message to dmesg and 'corrects' the value of managed_pages used to prepare data for 'meminfo' files. It does not change zone->managed_pages, only the stats shown to the users. So, it is not the fix but, instead, it just hides the problem and allows our testing to continue. The patch was prepared in the scope of https://jira.sw.ru/browse/PSBM-129304. The problem seems to be fixed in the kernel from RHEL 8.4, so the patch should be dropped after rebase on top of that kernel. Signed-off-by: Evgenii Shatokhin <[email protected]> --- mm/page_alloc.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fca875aa8ab3..874de3912942 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5009,8 +5009,25 @@ void si_meminfo_node(struct sysinfo *val, int nid) unsigned long free_highpages = 0; pg_data_t *pgdat = NODE_DATA(nid); - for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) - managed_pages += pgdat->node_zones[zone_type].managed_pages; + for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) { + struct zone *zone = &pgdat->node_zones[zone_type]; + unsigned long nr_managed = zone->managed_pages; + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); + + /* + * HACK, PSBM-129304 + * In certain cases, the number of managed pages becomes less + * than the number of free pages in a zone, leading to negative + * or overly large 'MemUsed' (managed_pages - free_pages). + * 'Correct' the numbers until the root cause is resolved. + */ + if (nr_managed < nr_free) { + pr_notice_once("Node %d, zone %d: managed_pages (%lu) is less than free_pages (%lu)\n", + nid, zone_type, nr_managed, nr_free); + nr_managed = nr_free; + } + managed_pages += nr_managed; + } val->totalram = managed_pages; val->sharedram = node_page_state(pgdat, NR_SHMEM); val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES); -- 2.29.0 _______________________________________________ Devel mailing list [email protected] https://lists.openvz.org/mailman/listinfo/devel
