On 21.11.2025 10:03, Konstantin Belousov wrote:
On Fri, Nov 21, 2025 at 10:36:42AM +0200, Konstantin Belousov wrote:
On Fri, Nov 21, 2025 at 08:12:55AM +0100, Michal Meloun wrote:
I have confirmed that jmalloc assertions are caused by mmap() failure. It
can return non-zeroed page(s) for mmap(MAP_ANON), which is clearly a bug.
I have confirmed this on native ARMv7, and according to Mark, it is also
reproducible on ARM32 and i386 jails. I think I saw it also on a
memory-constrained (4 GB) aarch64, but I cannot reproduce it yet.
Have somebody idea how to identify vm faults associated with anon mmap to
trigger detection of this failure in kernel? Or any other hint?
I think It would be much more visible if freshly allocated anonymous pages
are corrupted. A similar mechanism to get zeroed pages is used to get
fresh page table pages, and corruption there must cause a lot of kernel
page faults with 'invalid PTE bit' hw reports.
But of course everything is possible.
VM has an optimization where we track known-to-be-zeroed free page
separately, by marking them with PG_ZERO flag. If allocation needs a
zeroed page and the flag is set, we skip calling pmap_zero_page() on it.
Also, in vm_page_free_prep() when we are told that the page is zeroed,
with DIAGNOSTIC enabled, on amd64 and arm64, we do check for that.
So lets add slow check for vm_fault code that supposedly zeroed page is
indeed zeroed. Can you try to catch the issue with the patch applied,
and DIAGNOSTIC enabled? Patch is arch-agnostic and I believe should
work on armv7, although obviously causing slowdown.
I also made the vm_page_free_prep() check MI.
Please use https://reviews.freebsd.org/D53850 instead of the previous
patch.
Hi Kib,
i was unexpectedly out of the office today, so I only got back to
debugging a moment ago and couldn't devote much time to it today.
First, many thanks for your efforts, but this check doesn't trigger when
the problem occurs
To be more precise, testing case
on fresh kernel(d8bfcacd12aba73188c44a157c707908e275825d)
with PMAP_DEBUG defined in pmap-v6.c and with
trivial zero check for first page at this place ->
https://cgit.freebsd.org/src/tree/contrib/jemalloc/src/pages.c#n281
causes this failure:
__je_pages_map: addr: 0x0, ret: 0x3087b000, size: 4096, alignment: 4096,
prot: 0x00000003, flags: 0x0C001002
__je_pages_map: i: 0, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 23, p[i]: 0x308E5F94, p: 0x3087b000
__je_pages_map: i: 27, p[i]: 0x308E23F4, p: 0x3087b000
__je_pages_map: i: 29, p[i]: 0x308F077C, p: 0x3087b000
__je_pages_map: i: 30, p[i]: 0x308C3444, p: 0x3087b000
__je_pages_map: i: 33, p[i]: 0x308C57BC, p: 0x3087b000
__je_pages_map: i: 36, p[i]: 0x308E41E4, p: 0x3087b000
__je_pages_map: i: 39, p[i]: 0x308EA2E4, p: 0x3087b000
__je_pages_map: i: 42, p[i]: 0x308EC444, p: 0x3087b000
__je_pages_map: i: 44, p[i]: 0x308EE60C, p: 0x3087b000
__je_pages_map: i: 47, p[i]: 0x308C7AF4, p: 0x3087b000
__je_pages_map: i: 58, p[i]: 0x308C9F24, p: 0x3087b000
__je_pages_map: i: 79, p[i]: 0x308E8114, p: 0x3087b000
__je_pages_map: i: 80, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 160, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 240, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 320, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 400, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 480, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 560, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 640, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 720, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 800, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 880, p[i]: 0xFFFFFFFF, p: 0x3087b000
__je_pages_map: i: 960, p[i]: 0xFFFFFFFF, p: 0x3087b000
The pattern looks interesting; it is not exactly same in all cases, but
it is similar. Another example:
__je_pages_map: addr: 0x0, ret: 0x32d4d000, size: 4096, alignment: 4096,
prot: 0x00000003, flags: 0x0C001002
__je_pages_map: i: 64, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 144, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 224, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 304, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 384, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 408, p[i]: 0x32CE5854, p: 0x32d4d000
__je_pages_map: i: 416, p[i]: 0x32CE5064, p: 0x32d4d000
__je_pages_map: i: 429, p[i]: 0x32CE5BD4, p: 0x32d4d000
__je_pages_map: i: 455, p[i]: 0x32CE5FD4, p: 0x32d4d000
__je_pages_map: i: 464, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 544, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 589, p[i]: 0x32CE6744, p: 0x32d4d000
__je_pages_map: i: 591, p[i]: 0x32CE6B04, p: 0x32d4d000
__je_pages_map: i: 603, p[i]: 0x32CE6EC4, p: 0x32d4d000
__je_pages_map: i: 624, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 704, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 784, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 864, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 944, p[i]: 0xFFFFFFFF, p: 0x32d4d000
__je_pages_map: i: 969, p[i]: 0x40F8917C, p: 0x32d4d000
__je_pages_map: i: 971, p[i]: 0x40EB9F0C, p: 0x32d4d000
__je_pages_map: i: 973, p[i]: 0x40D164CC, p: 0x32d4d000
__je_pages_map: i: 978, p[i]: 0x40F47EFC, p: 0x32d4d000
__je_pages_map: i: 980, p[i]: 0x4116768C, p: 0x32d4d000
__je_pages_map: i: 996, p[i]: 0x3E5430FC, p: 0x32d4d000
__je_pages_map: i: 1002, p[i]: 0x40F88FAC, p: 0x32d4d000
__je_pages_map: i: 1006, p[i]: 0x40D1669C, p: 0x32d4d000
__je_pages_map: i: 1011, p[i]: 0x40F47D2C, p: 0x32d4d000
__je_pages_map: i: 1012, p[i]: 0x3E542F2C, p: 0x32d4d000
__je_pages_map: i: 1021, p[i]: 0x40EB9D3C, p: 0x32d4d000
__je_pages_map: i: 1022, p[i]: 0x4116785C, p: 0x32d4d000
Still searching,
Michal