On 6/18/26 13:43, Wandun wrote:
> 
> 
> On 6/18/26 02:52, Vlastimil Babka (SUSE) wrote:
>> On 6/4/26 04:38, Wandun Chen wrote:
>>> From: Wandun Chen <[email protected]>
>>>
>>> compact_unevictable_allowed is default 0 under PREEMPT_RT,
>>> isolate_migratepages_block() skips folios with PG_unevictable set.
>>> However, mlock_folio() sets PG_mlocked immediately but defers
>>> PG_unevictable to mlock_folio_batch(), result in a folio with
>>> PG_mlocked=1 but PG_unevictable=0. Compaction will isolate such a
>>> folio.
>>>
>>> Fix by checking folio_test_mlocked() together with the existing
>>> folio_test_unevictable() check.
>>>
>>> A similar issue has been reported by Alexander Krabler on a 6.12-rt
>>> aarch64 system. Vlastimil suggested to check the mlocked flag [1].
>>>
>>> Reported-by: Alexander Krabler <[email protected]>
>>> Closes: 
>>> https://lore.kernel.org/all/du0pr01mb10385345f7153f3341009818882...@du0pr01mb10385.eurprd01.prod.exchangelabs.com/
>>> Suggested-by: Vlastimil Babka <[email protected]>
>>> Signed-off-by: Wandun Chen <[email protected]>
>>> Link: 
>>> https://lore.kernel.org/all/[email protected]/ 
>>> [1]
>> 
>> Well in that thread, Hugh doubted my suggestion and then it seems we didn't
>> concluded anything. Did you actually in practice observe the issue that
>> Alexander had, and that this patch fixed it, or is that theoretical?
>> 
> Yes, I wrote a test case that can reproduce it in a few second.
> 
> The test case contains 3 steps:
> 1. mlockall
> 2. mmap file(2GB) + trigger file write page fault;
> 3. during step 1, trigger compact via /proc/sys/vm/compact_memory
> 
> 
> My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
> preempt_rt and includes the tracepoint in patch 02.
> After running the reproduction program for a few seconds, the
> following output appears.

Ah, nice.

> repro-403     [004] ....1   101.270505: mm_compaction_isolate_folio: 
> pfn=0x71e3a mode=0x0 flags=referenced|uptodate|mlocked
> repro-403     [004] ....1   101.270507: mm_compaction_isolate_folio: 
> pfn=0x71e3b mode=0x0 flags=referenced|uptodate|mlocked
> repro-403     [004] ....1   101.270513: mm_compaction_isolate_folio: 
> pfn=0x71e3c mode=0x0 flags=referenced|uptodate|mlocked
> repro-403     [004] ....1   101.270515: mm_compaction_isolate_folio: 
> pfn=0x71e3d mode=0x0 flags=uptodate|mlocked
> repro-403     [004] ....1   101.270517: mm_compaction_isolate_folio: 
> pfn=0x71e3e mode=0x0 flags=uptodate|mlocked
> repro-403     [004] ....1   101.270520: mm_compaction_isolate_folio: 
> pfn=0x71e3f mode=0x0 flags=uptodate|mlocked
> 
> 
> Unfortunately, I recently found that there is still a bug in the
> fix patch. Setting mlocked in the mlock_folio function could happen
> even after the page is successfully isolated, so it still cannot
> prevent migration. Because of this, I need to think more about how
> to fix it.
> 
> Perhaps we should double-check whether the page is mlocked during
> the actual migration phase.

So IIUC the isolation+migration might be started between the folio is
allocated, and mlocked? In that case the check during migration could still
be racy, and if the page is isolated, it's already bad for the RT process.

So this would only be a short-term problem after the mlockall, but we don't
have a way for the RT process to know the moment it's all settled, right?
Probably the proper solution would be for mlock[all]() itself to wait for an
isolated page, and only continue once it knows it can't be isolated anymore.
This might howver would go against some of the folio batching optimizations?

> What do you think of this best-effort approach?
> 
> 
> Best regards,
> Wandun
> 
> 
> 
> 
> 
> The full reproducer is as below:
> 
> /* gcc repro.c -o repro -lpthread */
> 
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <pthread.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> 
> #define PAGE_SIZE       4096
> #define NR_PAGES        32
> #define FILE_SIZE       (2ULL * 1024 * 1024 * 1024)
> 
> static void *worker_fn(void *arg)
> {
>       int fd = (long)arg;
>       size_t len = (size_t)FILE_SIZE;
>       char *p = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>       if (p == MAP_FAILED)
>               return NULL;
> 
>       for (size_t off = 0; off + NR_PAGES * PAGE_SIZE <= len;
>            off += NR_PAGES * PAGE_SIZE) {
>               for (int i = 0; i < NR_PAGES; i++)
>                       p[off + i * PAGE_SIZE] = 1;
>               usleep(200);
>       }
> 
>       munmap(p, len);
>       return NULL;
> }
> 
> static void *compact_fn(void *arg)
> {
>       (void)arg;
>       int fd = open("/proc/sys/vm/compact_memory", O_WRONLY);
>       if (fd < 0)
>               return NULL;
> 
>       while (1) {
>               if (write(fd, "1", 1) < 0) {}
>               usleep(5000);
>       }
> }
> 
> int main(void)
> {
>       mlockall(MCL_CURRENT | MCL_FUTURE);
> 
>       int fd = open("./repro_largefile.dat", O_RDWR | O_CREAT, 0600);
>       if (fd < 0)
>               return 1;
>       unlink("./repro_largefile.dat");
>       if (ftruncate(fd, (off_t)FILE_SIZE) < 0)
>               return 1;
> 
>       printf("repro_largefile: 1 worker, %d pages/batch, Ctrl-C to stop\n",
>              NR_PAGES);
> 
>       pthread_t compact, worker;
>       pthread_create(&compact, NULL, compact_fn, NULL);
>       pthread_create(&worker, NULL, worker_fn, (void *)(long)fd);
> 
>       pthread_join(worker, NULL);
>       return 0;
> }
> 
>>> ---
>>>  mm/compaction.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index b776f35ad020..7e07b792bcb5 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1116,7 +1116,8 @@ isolate_migratepages_block(struct compact_control 
>>> *cc, unsigned long low_pfn,
>>>             is_unevictable = folio_test_unevictable(folio);
>>>  
>>>             /* Compaction might skip unevictable pages but CMA takes them */
>>> -           if (!(mode & ISOLATE_UNEVICTABLE) && is_unevictable)
>>> +           if (!(mode & ISOLATE_UNEVICTABLE) &&
>>> +               (is_unevictable || folio_test_mlocked(folio)))
>>>                     goto isolate_fail_put;
>>>  
>>>             /*
>> 
> 


Reply via email to